Xen project Mailing List

Re: [Xen-devel] architecture for backend domains

>I'm going through all of the latest Xen 2.0 documentation, and I had a >couple of questions: > >o it seems from the docs that its possible to assign io privileges and >administrative privileges to *any* domain (apart from dom0, which has >these privileges built in IIRC). is this correct? Sort of. There are two 'capabilities' a domain may have: full administrative privilege (DF_PRIVILEGED) and 'physical device' privilege (DF_PHYSDEV). DF_PRIVILEGED allows full access to all hypervisor operations (i.e. to create, inspect or destroy other domains + access all memory including all PCI space. DF_PHYSDEV allows more restricted access. The intention is that DF_PRIVILEGED is given to only dom0 or, perhaps, a dom0-replacement (for doing live upgrades of dom0) although we've never done this. DF_PHYSDEV on the other hand is intended for any backend domain. In the current implementation, however: - dom0 is given DF_PRIVILEGED as expected when creating the first domain - there is no hypercall whose purpose is to add DF_PRIVILEGED to a new domain (nor can one specify this in create domain) -- as such doing the live upgrade thing is not really cleanly possible - the privileged pcidev_access hypercall (used to configure a backend domain) actually sets both DF_PRIVILEGED and DF_PHYSDEV in the backend domain. This is a temporary measure; the architectural intention is that DF_PHYSDEV alone will suffice for backend domains, and that playing around with DF_PRIVILEGED will be handled in a cleaner fashion. >o can there be multiple backend domains for a single physical device >(like a network interface)? if so, then there is a scheduling involved >at multiple levels -- first Xen will have to schedule backends across >the physdev, and then each backend will have to schedule across the >domains that use it as backend. Further, what mechanism does Xen use >to determine which backend to direct pkts to and from the backend >which client domain to forward them to? There's nothing to stop someone from configuring the system to have two backend domains with access to the same physical device. However trying to run two copies of a device driver against a single physical device will lead to tears -- device drivers tend to assume they're the only ones driving the hardware, and so things will likely get completely moulinexed. This is not a 'scheduling' issue; there's no way to get two device drivers to share a device without (a) hacking the crap out of the device drivers and (b) inserting a whole bunch of synchronization and communication between the drivers. >o if there is just one backend, how exactly does access to the devices >take place? From the docs, I gather that each domain using the device >has 2 rings -- one for sends and one for receivs (very generally >speaking). Also, the docs say that the backend can directly map >buffers of the virtual domains in Xen to enable DMA to them. But at >other places in the docs, I got the impression that client domains >(and not just backends) have these descriptor rings as well. So >basically I'm asking if all communication happens through the backend, >or do client domains talk directly to Xen. The "2 rings" referred to are basically an inter-domain communication mechanism -- that is, they allow the transfer of information between e.g. a client domain and a backend domain. Some of the confusion may arise from the fact that we refer to the part of the client domain that does this as "the frontend device driver" and the part of the backend domain that does this as "the backend device driver". However both of these are *virtual* device drivers and don't actually speak to physical devices at all. They are just two ends of a communication mechanism which allows e.g. a client to request "read block 1000 from device sda3 into a buffer at 0x5ca000"). The actual hardware is accessed by regular device drivers running in the backend domain -- today that means any linux 2.4.27 or linux 2.6.8.1 device drivers. These access the hardware in almost exactly the same way as they do normally - via memory mapping bits of the PCI address space, reading and writing to that, and receiving interrupts. Some small modifications are required in xenlinux to ensure this is done in a safe way (i.e. indirecting through xen for privilege checks etc), but otherwise the driver is the same. Overall then if we consider e.g. a process reading a file in a client (non backend) domain the control flow is: 1. client process does read() syscall -> client kernel 2. client kernel (VFS layer) invokes frontend driver for actual access 3. frontend driver uses I/O rings to send a message to backend domain 4. backend driver receives message on I/O rings 5. backend driver forwards the request to the 'real' physical device driver which in turn forwards the request to the actual device 6. -- time passes -- 7. device returns data to real device driver 8. real device driver returns data to backend driver 9. backend driver puts response onto the I/O rings a. frontend driver receives response and passes up to VFS layer b. data returned to client process Hope this makes things a little clearer. We're working on updating the documentation for 2.0, but it'll likely be an ongoing process. cheers, S.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.