[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Full virtualization and I/O



 

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Liang Yang
> Sent: 22 November 2006 17:17
> To: Petersson, Mats
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Thomas Heinz
> Subject: Re: [Xen-devel] Full virtualization and I/O
> 
> Hi Mats,
> 
> This para-virtualized driver in HVM domain is just like the 
> dummy device 
> driver in para-virtualized domain. And after using this 
> para-virtualized 
> driver in HVM domain, HVM doamin is also using this kind of 
> front-end/back-end model to handle I/O instead of using 
> "device model" which 
> a typical HVM domain will use.
> 
> Am I correct?

Yes, exactly. 

Of course, the HVM domain may well use a mixture, say for example using
the normal (device-model) IDE device driver to access the disk, and a
para-virtual network driver to access the network. 

--
Mats
> 
> Liang
> 
> ----- Original Message ----- 
> From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
> To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>
> Cc: "Thomas Heinz" <thomasheinz@xxxxxxx>; 
> <xen-devel@xxxxxxxxxxxxxxxxxxx>
> Sent: Wednesday, November 22, 2006 9:57 AM
> Subject: RE: [Xen-devel] Full virtualization and I/O
> 
> 
> 
> 
> > -----Original Message-----
> > From: Liang Yang [mailto:multisyncfe991@xxxxxxxxxxx]
> > Sent: 22 November 2006 16:51
> > To: Petersson, Mats
> > Cc: Thomas Heinz; xen-devel@xxxxxxxxxxxxxxxxxxx
> > Subject: Re: [Xen-devel] Full virtualization and I/O
> >
> > Hi Mats,
> >
> > Thanks for your explanation in such details.
> >
> > As you mentioned in your post, could you elaborate using
> > unmodified driver
> > in HVM domain (i.e. using front-end driver in
> > full-virtualized domain)? Do
> > you think para-virtualized domain will have exactly the same
> > behavior as
> > full-virtualized domain when both of them are using this
> > unmodified driver
> > to access virtual block devices?
> 
> Not sure exactly what you're asking, but if you're asking if the
> performance of driver-related work will be approximately the 
> same, yes.
> 
> By the way, I wouldn't call that an "unmodified" driver - it is
> definitely a MODIFIED driver (a para-virtual driver).
> 
> --
> Mats
> >
> > Best regards,
> >
> > Liang
> >
> > ----- Original Message ----- 
> > From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
> > To: "Thomas Heinz" <thomasheinz@xxxxxxx>;
> > <xen-devel@xxxxxxxxxxxxxxxxxxx>
> > Sent: Wednesday, November 22, 2006 9:24 AM
> > Subject: RE: [Xen-devel] Full virtualization and I/O
> >
> >
> > > -----Original Message-----
> > > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> > > Thomas Heinz
> > > Sent: 20 November 2006 23:39
> > > To: xen-devel@xxxxxxxxxxxxxxxxxxx
> > > Subject: [Xen-devel] Full virtualization and I/O
> > >
> > > Hi
> > >
> > > Full virtualization is about providing multiple virtual ISA level
> > > environments and mapping them to a single physical one. One
> > > particular
> > > aspect of this mapping are I/O instructions (explicit or
> > > mmapped I/O). In
> > > general, there are two strategies to partition the devices,
> > > either in time
> > > or in space. Partitioning a device in space means that the
> > > device (or a
> > > part of it) is exclusively available to a single VM.
> > > Partitioning a device
> > > in time (or time multiplexing) means that it can be used by
> > > multiple VMs
> > > but only one VM may use it at any point in time.
> >
> > The Xen approach is to not allow any sharing of devices, a device is
> > owned by one domain, no other domain can directly access the device.
> > There is a protocol of so called frontend/backend driver which is
> > basically a dummy-device that forwards a request to another domain
> > (normally domain 0) and the other half of the driver-pair is
> > picking up
> > this data, forwards it to some processing task, that then sends the
> > packet onto the real hardware.
> >
> > For fully virtualized mode (hardware supported virtual
> > machine, such as
> > AMD-V or Intel VT, aka HVM), there is a different model,
> > where a "device
> > model" is involved to perform the hardware modelling. In 
> Xen, this is
> > using a modified version of qemu (called qemu-dm), which 
> has a fairly
> > complete set of "hardware" in it's model. It's got for example IDE
> > controller, several types of network devices, graphics and
> > mouse/keyboard models. The things you'd usually find in a 
> PC, that is.
> > The way it works is that the hypervisor intercepts IOIO and memory
> > mapped IO regions that match the devices involved (such as the
> > A0000-BFFFF region for VGA frame buffer memory or the 0x1F0-0x1F7 IO
> > ports for the IDE controller), and forwards a request from the
> > hypervisor to qemu-dm, where the operation changes the 
> current state,
> > and when it's necessary, the state-change will result in 
> for example a
> > read-request to the "hard-disk" (which may be a real disk, 
> a file on a
> > local disk, or a file on a network storage device, to give some
> > examples).
> >
> > There is also the option of using the frontend drivers as described
> > above in the fully virtualized model.
> >
> > Finally, while I'm on the subject of fully virtualized mode: It is
> > currently not possible to give a DMA-based device to a
> > fully-virtualized
> > domain. The reason for this is that the guest OS will have been told
> > that memory is from 0..256MB (say), and it's actual machine physical
> > address is at 256MB..512MB. The OS is completely unaware of this
> > "mismatch". So the OS will perform some operation to take a virtual
> > address of some buffer (say a network packet) and make it into a
> > "physical address", which will be an address in the range 
> of 0..256MB.
> > This will of course (at least) lead to the wrong data being
> > transmitted,
> > as the address of the actual data is somewhere in the range
> > 256MB..512MB. The only solution to this is to have an 
> IOMMU, which can
> > translate the guest's understanding of a physical address
> > (0..256MB) to
> > a machine physical address (256..512MB).
> >
> > >
> > > I am trying to understand how I/O virtualization on the ISA
> > > level works if
> > > a device is shared between multiple VM instances. On a very
> > > high level, it
> > > should be as follows. First of all, the VMM has to intercept
> > > the VM's I/O
> > > commands (I/O instructions or load/store to dedicated memory
> > > addresses -
> > > let's ignore interrupts for the moment). This could be done
> > > by traps or by
> > > replacing the resp. instructions by VMM calls to I/O
> > > primitives. The VMM
> > > keeps multiple device model instances (one for each VM using
> > > the device)
> > > in memory. The models somehow reflect the low level I/O API
> > > of the device.
> > > Depending on which I/O command is issued by the VM, either
> > the memory
> > > model is changed or a number of I/O instructions are executed
> > > to make the
> > > physical device state reflect the one represented in the
> > memory model.
> >
> > Do you by ISA mean "Instruction Set Architecture" or 
> something else (I
> > presume it's NOT meaning ISA-bus...)?
> >
> > Intercepting IOIO instructions or MMIO instructions is not 
> that hard -
> > in HVM the two processor architectures have specific intercepts and
> > bitmaps to indicate which IO instructions should be 
> intercepted. MMIO
> > will require the page-tables to be set up such that the 
> memory mapped
> > region is mapped "not present" so that any operation to this region
> > gives a page-fault, and then the page-fault is analyzed to 
> see if it's
> > for a MMIO address or for a "real page fault".
> >
> > For para-virtualization, the model is similar, but the 
> exact model of
> > how to intercept the IOIO or MMIO instruction is slightly 
> different -
> > but in essence it's the same principle. Let me know if you 
> really need
> > to know how Xen goes about doing this, as it's quite 
> complicated (more
> > so than the HVM version, for sure).
> >
> >
> > >
> > > This approach brings up a number of questions. It would be
> > > great if some of
> > > the virtualization experts here could shed some light on them
> > > (even though
> > > they are not immediately related to Xen, I know):
> > >
> > > - How do these device memory models look like? Is there a common
> > >   (automata) theory behind or are they done ad hoc?
> >
> > Not sure what you're asking for here. Since the devices are either
> > modeled after a REAL device (qemu-dm) and as such will resemble as
> > closely as possible the REAL hardware device that it's
> > emulating, or in
> > the frontend/backend driver, there is an "idealized model", 
> such that
> > the request contains just the basic data that the OS 
> provides normally
> > to the driver, and it's placed in a queue with a message-signaling
> > system to tell the other side that it's got something in the queue.
> >
> > > - What kind of strategies/algorithms are used in the merge
> > > phase, i.e. the
> > >   phase where the virtual memory model and the physical one are
> > >   synchronized? What kind of problems can occur in this phase?
> >
> > The Xen approach is to avoid this by only giving one device to each
> > machine.
> >
> > > - Are specific usage patterns used in real world
> > implementations (e.g.
> > >   VMWare) to simplify the virtualization (model or merge phase)?
> >
> > This is probably the wrong list to ask detailed questions about how
> > VMWare works... ;-)
> >
> > > - Do you have any interesting pointers to literature dealing
> > > with full I/O
> > >   virtualization? In particular, how does VMWare's full
> > virtualization
> > >   works with respect to I/O?
> >
> > Again, wrong list for VMWare questions.
> >
> > > - Is every device time partitionable? If not, which
> > > requirements does it
> > >   have to meet to be time partitionable?
> >
> > Certainly not - I would say that almost all devices are NOT time
> > partitionable, as the state in the device is dependant on 
> the current
> > usage. The more complex the device is, the more likely it is to have
> > difficulties, but even such a simple deevice as a serial port would
> > struggle to work in a time-shared fashion (not to mention 
> that serial
> > ports generally are used for multiple transactions to make a whole
> > "bigger picture transaction", so for example a web-server
> > connected via
> > a serial modem would send a packet of several hundred bytes to the
> > serial port driver, which is then portioned out as and when 
> the serial
> > port is ready to send another few bytes. If you switch from
> > one guest to
> > another during this process, and the second guest also has
> > something to
> > send on the serial port, you'd end up with a very scrambled
> > message from
> > the first guest and quite likely the second guests message 
> completely
> > lost!).
> >
> > There are some devices that are specifically built to 
> manage multiple
> > hosts, but other than that, any sharing of a device requires some
> > software to gather up "a full transaction" and then sending
> > that to the
> > actual hardware, often also waiting for the transaction to
> > complete (for
> > example the interrupt signal to say that the hard disk write is
> > complete).
> >
> >
> > >   -> I don't think every device is. What about a device
> > which supports
> > >      different modes of operation. If two VMs drive the
> > > virtual device in
> > >      different modes, it may not be possible to constantly
> > > switch between
> > >      them. Ok, this is pretty artificial.
> >
> > A particular problem is devices where you can't necessarily 
> read back
> > the last mode-setting, which may well be the case in many different
> > devices. You can't, for example, read back all the 
> registers on an IDE
> > device, because the read of a particular address amy give the status
> > rather than the current comamnd sent, or some such.
> >
> > --
> > Mats
> > >
> > > Thanks a lot for your help!
> > >
> > >
> > > Best wishes
> > >
> > > Thomas
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-devel
> > >
> > >
> > >
> >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel
> >
> >
> >
> >
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.