Xen project Mailing List

RE: [Xen-devel] Full virtualization and I/O

To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>

From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>

Date: Wed, 22 Nov 2006 18:22:52 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Thomas Heinz <thomasheinz@xxxxxxx>

Delivery-date: Wed, 22 Nov 2006 09:26:26 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AccOWh8vXRz5yFDqSuagVnV0XShOaAAAIb/w

Thread-topic: [Xen-devel] Full virtualization and I/O

> -----Original Message----- > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Liang Yang > Sent: 22 November 2006 17:17 > To: Petersson, Mats > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Thomas Heinz > Subject: Re: [Xen-devel] Full virtualization and I/O > > Hi Mats, > > This para-virtualized driver in HVM domain is just like the > dummy device > driver in para-virtualized domain. And after using this > para-virtualized > driver in HVM domain, HVM doamin is also using this kind of > front-end/back-end model to handle I/O instead of using > "device model" which > a typical HVM domain will use. > > Am I correct? Yes, exactly. Of course, the HVM domain may well use a mixture, say for example using the normal (device-model) IDE device driver to access the disk, and a para-virtual network driver to access the network. -- Mats > > Liang > > ----- Original Message ----- > From: "Petersson, Mats" <Mats.Petersson@xxxxxxx> > To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx> > Cc: "Thomas Heinz" <thomasheinz@xxxxxxx>; > <xen-devel@xxxxxxxxxxxxxxxxxxx> > Sent: Wednesday, November 22, 2006 9:57 AM > Subject: RE: [Xen-devel] Full virtualization and I/O > > > > > > -----Original Message----- > > From: Liang Yang [mailto:multisyncfe991@xxxxxxxxxxx] > > Sent: 22 November 2006 16:51 > > To: Petersson, Mats > > Cc: Thomas Heinz; xen-devel@xxxxxxxxxxxxxxxxxxx > > Subject: Re: [Xen-devel] Full virtualization and I/O > > > > Hi Mats, > > > > Thanks for your explanation in such details. > > > > As you mentioned in your post, could you elaborate using > > unmodified driver > > in HVM domain (i.e. using front-end driver in > > full-virtualized domain)? Do > > you think para-virtualized domain will have exactly the same > > behavior as > > full-virtualized domain when both of them are using this > > unmodified driver > > to access virtual block devices? > > Not sure exactly what you're asking, but if you're asking if the > performance of driver-related work will be approximately the > same, yes. > > By the way, I wouldn't call that an "unmodified" driver - it is > definitely a MODIFIED driver (a para-virtual driver). > > -- > Mats > > > > Best regards, > > > > Liang > > > > ----- Original Message ----- > > From: "Petersson, Mats" <Mats.Petersson@xxxxxxx> > > To: "Thomas Heinz" <thomasheinz@xxxxxxx>; > > <xen-devel@xxxxxxxxxxxxxxxxxxx> > > Sent: Wednesday, November 22, 2006 9:24 AM > > Subject: RE: [Xen-devel] Full virtualization and I/O > > > > > > > -----Original Message----- > > > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx > > > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of > > > Thomas Heinz > > > Sent: 20 November 2006 23:39 > > > To: xen-devel@xxxxxxxxxxxxxxxxxxx > > > Subject: [Xen-devel] Full virtualization and I/O > > > > > > Hi > > > > > > Full virtualization is about providing multiple virtual ISA level > > > environments and mapping them to a single physical one. One > > > particular > > > aspect of this mapping are I/O instructions (explicit or > > > mmapped I/O). In > > > general, there are two strategies to partition the devices, > > > either in time > > > or in space. Partitioning a device in space means that the > > > device (or a > > > part of it) is exclusively available to a single VM. > > > Partitioning a device > > > in time (or time multiplexing) means that it can be used by > > > multiple VMs > > > but only one VM may use it at any point in time. > > > > The Xen approach is to not allow any sharing of devices, a device is > > owned by one domain, no other domain can directly access the device. > > There is a protocol of so called frontend/backend driver which is > > basically a dummy-device that forwards a request to another domain > > (normally domain 0) and the other half of the driver-pair is > > picking up > > this data, forwards it to some processing task, that then sends the > > packet onto the real hardware. > > > > For fully virtualized mode (hardware supported virtual > > machine, such as > > AMD-V or Intel VT, aka HVM), there is a different model, > > where a "device > > model" is involved to perform the hardware modelling. In > Xen, this is > > using a modified version of qemu (called qemu-dm), which > has a fairly > > complete set of "hardware" in it's model. It's got for example IDE > > controller, several types of network devices, graphics and > > mouse/keyboard models. The things you'd usually find in a > PC, that is. > > The way it works is that the hypervisor intercepts IOIO and memory > > mapped IO regions that match the devices involved (such as the > > A0000-BFFFF region for VGA frame buffer memory or the 0x1F0-0x1F7 IO > > ports for the IDE controller), and forwards a request from the > > hypervisor to qemu-dm, where the operation changes the > current state, > > and when it's necessary, the state-change will result in > for example a > > read-request to the "hard-disk" (which may be a real disk, > a file on a > > local disk, or a file on a network storage device, to give some > > examples). > > > > There is also the option of using the frontend drivers as described > > above in the fully virtualized model. > > > > Finally, while I'm on the subject of fully virtualized mode: It is > > currently not possible to give a DMA-based device to a > > fully-virtualized > > domain. The reason for this is that the guest OS will have been told > > that memory is from 0..256MB (say), and it's actual machine physical > > address is at 256MB..512MB. The OS is completely unaware of this > > "mismatch". So the OS will perform some operation to take a virtual > > address of some buffer (say a network packet) and make it into a > > "physical address", which will be an address in the range > of 0..256MB. > > This will of course (at least) lead to the wrong data being > > transmitted, > > as the address of the actual data is somewhere in the range > > 256MB..512MB. The only solution to this is to have an > IOMMU, which can > > translate the guest's understanding of a physical address > > (0..256MB) to > > a machine physical address (256..512MB). > > > > > > > > I am trying to understand how I/O virtualization on the ISA > > > level works if > > > a device is shared between multiple VM instances. On a very > > > high level, it > > > should be as follows. First of all, the VMM has to intercept > > > the VM's I/O > > > commands (I/O instructions or load/store to dedicated memory > > > addresses - > > > let's ignore interrupts for the moment). This could be done > > > by traps or by > > > replacing the resp. instructions by VMM calls to I/O > > > primitives. The VMM > > > keeps multiple device model instances (one for each VM using > > > the device) > > > in memory. The models somehow reflect the low level I/O API > > > of the device. > > > Depending on which I/O command is issued by the VM, either > > the memory > > > model is changed or a number of I/O instructions are executed > > > to make the > > > physical device state reflect the one represented in the > > memory model. > > > > Do you by ISA mean "Instruction Set Architecture" or > something else (I > > presume it's NOT meaning ISA-bus...)? > > > > Intercepting IOIO instructions or MMIO instructions is not > that hard - > > in HVM the two processor architectures have specific intercepts and > > bitmaps to indicate which IO instructions should be > intercepted. MMIO > > will require the page-tables to be set up such that the > memory mapped > > region is mapped "not present" so that any operation to this region > > gives a page-fault, and then the page-fault is analyzed to > see if it's > > for a MMIO address or for a "real page fault". > > > > For para-virtualization, the model is similar, but the > exact model of > > how to intercept the IOIO or MMIO instruction is slightly > different - > > but in essence it's the same principle. Let me know if you > really need > > to know how Xen goes about doing this, as it's quite > complicated (more > > so than the HVM version, for sure). > > > > > > > > > > This approach brings up a number of questions. It would be > > > great if some of > > > the virtualization experts here could shed some light on them > > > (even though > > > they are not immediately related to Xen, I know): > > > > > > - How do these device memory models look like? Is there a common > > > (automata) theory behind or are they done ad hoc? > > > > Not sure what you're asking for here. Since the devices are either > > modeled after a REAL device (qemu-dm) and as such will resemble as > > closely as possible the REAL hardware device that it's > > emulating, or in > > the frontend/backend driver, there is an "idealized model", > such that > > the request contains just the basic data that the OS > provides normally > > to the driver, and it's placed in a queue with a message-signaling > > system to tell the other side that it's got something in the queue. > > > > > - What kind of strategies/algorithms are used in the merge > > > phase, i.e. the > > > phase where the virtual memory model and the physical one are > > > synchronized? What kind of problems can occur in this phase? > > > > The Xen approach is to avoid this by only giving one device to each > > machine. > > > > > - Are specific usage patterns used in real world > > implementations (e.g. > > > VMWare) to simplify the virtualization (model or merge phase)? > > > > This is probably the wrong list to ask detailed questions about how > > VMWare works... ;-) > > > > > - Do you have any interesting pointers to literature dealing > > > with full I/O > > > virtualization? In particular, how does VMWare's full > > virtualization > > > works with respect to I/O? > > > > Again, wrong list for VMWare questions. > > > > > - Is every device time partitionable? If not, which > > > requirements does it > > > have to meet to be time partitionable? > > > > Certainly not - I would say that almost all devices are NOT time > > partitionable, as the state in the device is dependant on > the current > > usage. The more complex the device is, the more likely it is to have > > difficulties, but even such a simple deevice as a serial port would > > struggle to work in a time-shared fashion (not to mention > that serial > > ports generally are used for multiple transactions to make a whole > > "bigger picture transaction", so for example a web-server > > connected via > > a serial modem would send a packet of several hundred bytes to the > > serial port driver, which is then portioned out as and when > the serial > > port is ready to send another few bytes. If you switch from > > one guest to > > another during this process, and the second guest also has > > something to > > send on the serial port, you'd end up with a very scrambled > > message from > > the first guest and quite likely the second guests message > completely > > lost!). > > > > There are some devices that are specifically built to > manage multiple > > hosts, but other than that, any sharing of a device requires some > > software to gather up "a full transaction" and then sending > > that to the > > actual hardware, often also waiting for the transaction to > > complete (for > > example the interrupt signal to say that the hard disk write is > > complete). > > > > > > > -> I don't think every device is. What about a device > > which supports > > > different modes of operation. If two VMs drive the > > > virtual device in > > > different modes, it may not be possible to constantly > > > switch between > > > them. Ok, this is pretty artificial. > > > > A particular problem is devices where you can't necessarily > read back > > the last mode-setting, which may well be the case in many different > > devices. You can't, for example, read back all the > registers on an IDE > > device, because the read of a particular address amy give the status > > rather than the current comamnd sent, or some such. > > > > -- > > Mats > > > > > > Thanks a lot for your help! > > > > > > > > > Best wishes > > > > > > Thomas > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > > http://lists.xensource.com/xen-devel > > > > > > > > > > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > http://lists.xensource.com/xen-devel > > > > > > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.