[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Re: AIO for better disk IO? Re: [Xen-users] Getting better Disk IO



I meant, which PV driver?

--
Mats 

> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Liang Yang
> Sent: 18 January 2007 17:01
> To: Petersson, Mats; Mark Williamson; 
> xen-users@xxxxxxxxxxxxxxxxxxx; Xen-Devel
> Cc: Tom Horsley; Goswin von Brederlow; James Rivera
> Subject: [Xen-devel] Re: AIO for better disk IO? Re: 
> [Xen-users] Getting better Disk IO
> 
> I'm using 8 Maxtor Atlas SAS II drives and the OS I'm using 
> is Red Hat 
> Enterprise Linux 4U4. Both JBOD, MD-RAID0 and MD-RAID5 shows 
> the consistent 
> performance gap.
> 
> Liang
> 
> ----- Original Message ----- 
> From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
> To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>; "Mark Williamson" 
> <mark.williamson@xxxxxxxxxxxx>; 
> <xen-users@xxxxxxxxxxxxxxxxxxx>; "Xen-Devel" 
> <xen-devel@xxxxxxxxxxxxxxxxxxx>
> Cc: "Tom Horsley" <tomhorsley@xxxxxxxxxxxx>; "Goswin von Brederlow" 
> <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>; "James Rivera" 
> <jrivera@xxxxxxxxxxx>
> Sent: Thursday, January 18, 2007 2:50 AM
> Subject: RE: AIO for better disk IO? Re: [Xen-users] Getting 
> better Disk IO
> 
> 
> 
> 
> > -----Original Message-----
> > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> > [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Liang Yang
> > Sent: 17 January 2007 16:44
> > To: Petersson, Mats; Mark Williamson;
> > xen-users@xxxxxxxxxxxxxxxxxxx; Xen-Devel
> > Cc: Tom Horsley; Goswin von Brederlow; James Rivera
> > Subject: Re: AIO for better disk IO? Re: [Xen-users] Getting
> > better Disk IO
> >
> > Hi Mats,
> >
> > Thanks for your reply.
> >
> > You said the HVM domain using PV driver should have the 
> same disk I/O
> > performance as PV guests. However, based on my experiments,
> > this is not
> > true. I have tried different kinds of I/O benchmark took 
> (dd, iozone,
> > iometer etc.) and they all show there is a big gap between a
> > HVM domain with
> > PV driver and a PV guest domain. This is especially true for
> > large size I/O
> > packet (64k, 128k and 256k sequential I/O). So far, the disk
> > I/O performance
> > HVM w/ PV driver is only 20~30% of PV guests.
> >
> What driver are you using, and in what OS?
> 
> 20-30% is a lot better than the 10% that I've seen with the 
> QEMU driver,
> but I still expect better results than 30% on the PV 
> driver... Not that
> I have actually tested this, as I have other tasks.
> 
> > Another thing I'm puzzled is disk I/O performance of PV
> > guests when tested
> > with small size packets (512B and 1K sequential I/O).
> > Although the PV guest
> > has  very close performance to native for large size I/O
> > packet, there is
> > still a clear gap between them for small size packets (512B
> > and 1K). I
> > really doubt if Xen hypervisor change the packet coalescing
> > behavior for
> > small size packets. Do you know if this is true?
> 
> Don't know. I wouldn't think so.
> 
> I would think that the reason for small packets being more 
> noticable is
> that the overhead of the hypervisor becomes much more 
> noticable than for
> a large packet, as the overhead is (almost) constant for the
> hypercall(s) involved, but the time used in the driver to actually
> perform the disk IO is more dependant on the size of the packet.
> 
> --
> Mats
> >
> > Liang
> >
> > ----- Original Message ----- 
> > From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
> > To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>; "Mark Williamson"
> > <mark.williamson@xxxxxxxxxxxx>; <xen-users@xxxxxxxxxxxxxxxxxxx>
> > Cc: "Tom Horsley" <tomhorsley@xxxxxxxxxxxx>; "Goswin von Brederlow"
> > <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>; "James Rivera"
> > <jrivera@xxxxxxxxxxx>
> > Sent: Wednesday, January 17, 2007 3:07 AM
> > Subject: RE: AIO for better disk IO? Re: [Xen-users] Getting
> > better Disk IO
> >
> >
> > > -----Original Message-----
> > > From: Liang Yang [mailto:multisyncfe991@xxxxxxxxxxx]
> > > Sent: 16 January 2007 17:53
> > > To: Petersson, Mats; Mark Williamson; 
> xen-users@xxxxxxxxxxxxxxxxxxx
> > > Cc: Tom Horsley; Goswin von Brederlow; James Rivera
> > > Subject: AIO for better disk IO? Re: [Xen-users] Getting
> > > better Disk IO
> > >
> > > Hi Mats,
> >
> > Let me first say that I'm not an expert on AIO, but I did 
> sit through
> > the presentation of the new blktap driver at the Xen Summit. The
> > following is to the best of my understanding, and could be
> > "codswallop"
> > for all that I know... ;-)
> > >
> > > I once posted my questions about the behavior of Asynchronous
> > > I/O under Xen
> > > which is also
> > > directly related disk I/O performance. however I did not get
> > > any response. I
> > > would appreciate
> > > if you can advise about this.
> > >
> > > As AIO can help improve better performance and Linux kernel
> > > keeps tuning the
> > > AIO path,  more and more IOs can be expected to take AIO path
> > > instead of
> > > regular I/O path.
> > >
> > > First Question:
> > > If we consider Xen, do we need to do AIO both in the
> > domain0 and guest
> > > domains at the same? For example, considering two situations,
> > > let a full
> > > virtualized guest domain still do regular I/O and domain0
> > > (vbd back end
> > > driver) do AIO; or let both full-virtualized guest domain and
> > > domain0 do
> > > AIO. What is possible performance difference here?
> >
> > The main benefit of AIO is that the current requestor (such 
> as the VBD
> > BackEnd driver) can continue doing other things whilst the
> > data is being
> > read/written to/from the actual storage device. This in turn reduces
> > latency where there are multiple requests outstanding from
> > the guest OS
> > (for example multiple guests requesting "simultaneously" or multiple
> > requests issued by the same guest close together).
> >
> > The bandwidth difference all arises from the reduced latency, not
> > because AIO is in itself better performing.
> >
> > >
> > > Second Question:
> > > Does Domain0 always wait till AIO data is available and then
> > > notify guest
> > > domain? or Domain0 will issue an interrupt immediately to
> > notify guest
> > > domain0 when AIO is queued? If the first case is true, then
> > > all AIOs will
> > > become synchronous.
> >
> > The guest can not be issued with an interrupt to signify "data
> > available" until the guest's data has been read, so for reads
> > at least,
> > the effect from the guest's perspective is still synchronous. This
> > doesn't mean that the guest can't issue further requests 
> (for example
> > from a different thread, or simply queuing multiple requests to the
> > device) and gain from the fact that these requests can be
> > started before
> > the first issued request is completed (from the backend
> > drivers point of
> > view).
> >
> >
> > >
> > > Third Question:
> > > Does Xen hypervisor change the behavior of Linux I/O
> > > scheduler more or less?
> >
> > Don't think so, but I'm by no means sure. In my view, the
> > modifications
> > to the Linux kernel are meant to be "the minimum necessary".
> > >
> > > Four Question:
> > > Will AIO have different performance impact on
> > > para-virtualized domain and
> > > full-virtualized domain respectively?
> >
> > The main difference is the reduction in overhead
> > (particularly latency)
> > in Dom0, which will affect both PV and HVM guests. HVM guests
> > have more
> > "other things" happening in Dom0 (such as Qemu work), but 
> it's hard to
> > say which gains more from this without also qualifying what else is
> > happening in the system. If you have PV drivers in a HVM domain, the
> > disk performance should be about the same, whilst the
> > (flawed) benchmark
> > of "hdparm" shows around 10x performance difference between 
> Dom0 and a
> > HVM guest - so we loose a lot in the process. I haven't 
> tried the same
> > with tap:aio: instead of file:, but I suspect the 
> interaction between
> > guest, hypervisor and qemu is a much larger component than
> > the tap:aio:
> > vs file: method of disk access.
> >
> > --
> > Mats
> > >
> > > Thanks,
> > >
> > > Liang
> > >
> > > ----- Original Message ----- 
> > > From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
> > > To: "Mark Williamson" <mark.williamson@xxxxxxxxxxxx>;
> > > <xen-users@xxxxxxxxxxxxxxxxxxx>
> > > Cc: "Tom Horsley" <tomhorsley@xxxxxxxxxxxx>; "Goswin von 
> Brederlow"
> > > <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>; "James Rivera"
> > > <jrivera@xxxxxxxxxxx>
> > > Sent: Tuesday, January 16, 2007 10:22 AM
> > > Subject: RE: [Xen-users] Getting better Disk IO
> > >
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> > > > [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> > > > Mark Williamson
> > > > Sent: 16 January 2007 17:07
> > > > To: xen-users@xxxxxxxxxxxxxxxxxxx
> > > > Cc: Tom Horsley; Goswin von Brederlow; James Rivera
> > > > Subject: Re: [Xen-users] Getting better Disk IO
> > > >
> > > > > I've been hoping to see replies to this, but lacking good
> > > > information
> > > > > here is the state of my confusion on virtual machine disks:
> > > > >
> > > > > If you read the docs for configuring disks on domu and
> > > hvm machines,
> > > > > you'll find a gazillion or so ways to present the disks to
> > > > the virtual
> > > > > machine.
> > > >
> > > > There are quite a lot of options, it's true ;-)
> > > >
> > > > > One of those ways (who's name I forget) provides (if I
> > > > understand things,
> > > > > which I doubt :-), provides a special kind of disk
> > > > emulation designed to
> > > > > be driven by special drivers on the virtual machine side.
> > > > The combination
> > > > > gives near direct disk access speeds in the virtual machine.
> > > > >
> > > > > The catch is that you need those drives for the kernel on
> > > > the virtual
> > > > > machine side. They may already exist, you may have to build
> > > > them, and
> > > > > depending on the kernel version, they may be hard to build.
> > > > >
> > > > > Perhaps someone who actually understands this could elaborate?
> > > >
> > > > Basically yes, that's all correct.
> > > >
> > > > To summarise:
> > > >
> > > > PV guests (that's paravirtualised, or Xen-native) use a
> > > > Xen-aware block device
> > > > that's optimised for good performance on Xen.
> > > > HVM guests (Hardware Virtual Machine, fully virtualised and
> > > > unaware of Xen)
> > > > use an emulated IDE block device, provided by Xen (actually,
> > > > it's provided by
> > > > the qemu-based device models, running in dom0).
> > > >
> > > > The HVM emulated block device is not as optimised (nor does
> > > > it lend itself to
> > > > such effective optimisation) for high virtualised
> > performance as the
> > > > Xen-aware device.  Therefore a second option is available for
> > > > HVM guests: an
> > > > implementation of the PV guest device driver that is able to
> > > > "see through"
> > > > the emulated hardware (in a secure and controlled way) and
> > > > talked directly as
> > > > a Xen-aware block device.  This can potentially give very
> > > > good performance.
> > >
> > > The reason the emulated IDE controller is quite slow is a
> > > consequence of
> > > the emulation. The way it works is that the driver in the 
> HVM domain
> > > writes to the same IO ports that the real device would use.
> > > These writes
> > > are intercepted by the hardware support in the processor and
> > > a VMEXIT is
> > > issued to "exit the virtual machine" back into the
> > hypervisor. The HV
> > > looks at the "exit reason", and sees that it's an IO WRITE
> > operation.
> > > This operation is then encoded into a small packet and 
> sent to QEMU.
> > > QEMU processes this packet and responds back to HV to say 
> "OK, done
> > > that, you may continue". HV then does a VMRUN (or VMRESUME in
> > > the Intel
> > > case) to continue the guest execution, which is probably 
> another IO
> > > instruction to write to the IDE controller. There's a total
> > > of 5-6 bytes
> > > written to the IDE controller per transaction, and whilst
> > > it's possible
> > > to combine some of these writes into a single write, it's 
> not always
> > > done that way. Once all writes for one transaction are
> > completed, the
> > > QEMU ide emulation code will perform the requested
> > operation (such as
> > > reading or writing a sector). When that is complete, a
> > > virtual interrupt
> > > is issued to the guest, and the guest will see this as a 
> "disk done"
> > > interrupt, just like real hardware.
> > >
> > > All these steps of IO intercepts takes several thousand
> > > cycles, which is
> > > a bit longer than a regular IO write operation would take
> > on the real
> > > hardware, and the system will still need to issue the real IO
> > > operations
> > > to perform the REAL hardware read/write corresponding to 
> the virtual
> > > disk (such as reading a file, LVM or physical partition) at
> > > some point,
> > > so this is IN ADDITION to the time used by the hypervisor.
> > >
> > > Unfortunately, the only possible improvement on this 
> scenario is the
> > > type "virtual-aware" driver that is described below.
> > >
> > > [Using a slightly more efficient model than IDE may also help, but
> > > that's going to be marginal compared to the benefits of using a
> > > virtual-aware driver].
> > >
> > > --
> > > Mats
> > > >
> > > > I don't know if these drivers are included in any Linux
> > > > distributions yet, but
> > > > they are available in the Xen source tree so that you can
> > > > build your own, in
> > > > principle.  Windows versions of the drivers are included in
> > > > XenSource's
> > > > products, I believe - including the free (as in beer)
> > > > XenExpress platform.
> > > >
> > > > There are potentially other options being developed,
> > > > including an emulated
> > > > SCSI device that should improve the potential for higher
> > > > performance IO
> > > > emulation without Xen-aware drivers.
> > > >
> > > > Hope that clarifies things!
> > > >
> > > > Cheers,
> > > > Mark
> > > >
> > > > -- 
> > > > Dave: Just a question. What use is a unicyle with no seat?
> > > > And no pedals!
> > > > Mark: To answer a question with a question: What use is a
> > > skateboard?
> > > > Dave: Skateboards have wheels.
> > > > Mark: My wheel has a wheel!
> > > >
> > > > _______________________________________________
> > > > Xen-users mailing list
> > > > Xen-users@xxxxxxxxxxxxxxxxxxx
> > > > http://lists.xensource.com/xen-users
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Xen-users mailing list
> > > Xen-users@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-users
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-users
> >
> >
> >
> 
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.