Architecture for Split Drivers Within Xen ========================================= The block, net and TPM Xen drivers are "split" drivers: they have a portion in a privileged domain handling the physical device -- the backend -- and a frontend in the unprivileged domain acting as a proxy upon the backend. The backend and frontend communicate using shared event channels and ring buffers, in an architecture known as the Xenbus. For concreteness, this section shall discuss the block driver, with differences between that and the net driver highlighted where necessary. The block driver is referred to in various places by the shorthands "blk" or "vbd". I shall also refer to the privileged domain as domain 0, and the unprivileged domain as domain U. The basic architecture can be considered to be a chain of four protagonists, a pair of devices communicating across the Xenbus, and then a pair of devices using that bus, handling the device-specific interaction with the kernel device layer. priv. unpriv. domain Xenbus interdomain Xenbus domain kernel -- blkback -- backend -------------- frontend -- blkfront -- kernel device instance instance interconnect instance instance device layer layer In order to establish communication across this chain, a number of parameters need to be passed from privileged to unprivileged domain, and vice versa. Some of these parameters are specific to the blkback/blkfront pair, and some are more general, applying to all split drivers. All of these parameters are passed using the Xen Store. Device Initialisation --------------------- To trigger the creation of a device connection, Xend (or another tool) writes frontend and backend details to the store. These new details are seen by the Xenbus driver instances, and initialisation begins. The details to be written are: /local/domain/0/backend/vbd/U//... frontend /local/domain/U/device/vbd/ frontend-id U state XenbusStateInitialising ... /local/domain/U/device/vbd//... backend /local/domain/0/backend/vbd/U/ backend-id 0 state XenbusStateInitialising ... The Xenbus backend instance has a watch on /local/domain/0/backend/vbd and the frontend instance has a watch on /local/domain/U/device/vbd. When the device details above are written, these two watches fire, and the Xenbus instances begin negotiation. The backend reads the frontend and frontend-id nodes, and then places a watch on /state. The frontend reads the backend and backend-id nodes, and then places a watch on /state. These two watches are handled symmetrically inside xenbus_probe:read_otherend_details and the details are made available in xenbus_device.otherend, xenbus_device.otherend_id, and xenbus_device.otherend_watch. For the backend, the hotplug subsystem is triggered, in order to bring the physical device online. Initialisation proceeds by calling the blkfront/blkback probe functions, in order that they may perform device-specific initialisation, and when this is complete, each driver will switch to a different state. blkback creates a watch on the store, waiting for the hotplug scripts to complete, and switches to XenbusStateInitWait. blkfront creates the ring buffer and event channel for sharing with the backend, advertises those details in the store, and switches to XenbusStateInitialised. When blkback has received the physical device details from the hotplug scripts, then it creates the necessary connection to the kernel device layer. When it has received the ring-buffer details from the frontend (indicated by the frontend state change) then it maps that connection. When both these things have happened (in either order) then it writes the physical device details to the store, for use by the frontend, and then switches to the Connected state. When blkfront sees the switch to the Connected state, it can read those physical device details, connect to the kernel device layer itself, and also switch to the Connected state. Event diagram: Xenbus Xenbus Hotplug Backend Frontend ------- ------- -------- Initialising Initialising | | |<---start----+ | | | | | InitWait | | | write | | ring/ write | channel physdets-------->| details | | |<---------------------Initialised | | write | physdets | | | Connected---------------------->| | | | Connected | | The netfront driver does not need to wait for details from its backend, and so can skip immediately to the Connected state. Device Closedown ---------------- Orderly closedown can be requested by the user, as a device hotplug request to Xend or other tools, or by the drivers when they encounter an error. An orderly closedown can be accomplished by changing the backend state to Closing. This will trigger the frontend to tear down it's kernel connection, flushing through any requests that it has in flight, and then to change to state Closed. The backend will respond to the frontend's change to Closed by deregistering itself and switching to state Closed also. Frontends may tear down immediately on error, without requiring the backend state to change to Closing first. Xenbus Xenbus Hotplug Backend Frontend ------- ------- -------- (Written by control | tools, e.g. Xend, | or by backend on | error) | | | Closing-------------------->Closing | | | | | flush | | | | Closed<--------------------Closed | | |<--------unregister | | device | | | | remove store directories | or Xenbus Xenbus Hotplug Backend Frontend ------- ------- -------- | (Written by frontend | on error) | | Closing<--------------------Closing | +----------------------->| | | | flush | | | | Closed<--------------------Closed | | <---------unregister | | device | | | | remove store directories | Migration --------- Migration differs from closedown in that the connection from frontend driver to the kernel device layer is not disturbed; only the Xenbus connection is torn down. When a driver is disconnected for this reason, it receives a call from the lower layers. On resumption, the new backend details are read and new watches established. Device reconfiguration ---------------------- If live reconfiguration is required between backend and frontend, this is handled with device-specific watches on the store. Each driver stays in the Connected state throughout this.