WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

[Xen-devel] RE: AIO for better disk IO? Re: [Xen-users] Getting better D

To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>, "Mark Williamson" <mark.williamson@xxxxxxxxxxxx>, xen-users@xxxxxxxxxxxxxxxxxxx, "Xen-Devel" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-devel] RE: AIO for better disk IO? Re: [Xen-users] Getting better Disk IO
From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
Date: Thu, 18 Jan 2007 10:50:38 +0100
Cc: Tom Horsley <tomhorsley@xxxxxxxxxxxx>, Goswin von Brederlow <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>, James Rivera <jrivera@xxxxxxxxxxx>
Delivery-date: Thu, 18 Jan 2007 01:56:48 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <BAY125-DAV10599763BE897D517E242A93AB0@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: Acc6VuZMJQi54q0VQd+EUeEZRVxZoQAjr5tQ
Thread-topic: AIO for better disk IO? Re: [Xen-users] Getting better Disk IO
 

> -----Original Message-----
> From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx 
> [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Liang Yang
> Sent: 17 January 2007 16:44
> To: Petersson, Mats; Mark Williamson; 
> xen-users@xxxxxxxxxxxxxxxxxxx; Xen-Devel
> Cc: Tom Horsley; Goswin von Brederlow; James Rivera
> Subject: Re: AIO for better disk IO? Re: [Xen-users] Getting 
> better Disk IO
> 
> Hi Mats,
> 
> Thanks for your reply.
> 
> You said the HVM domain using PV driver should have the same disk I/O 
> performance as PV guests. However, based on my experiments, 
> this is not 
> true. I have tried different kinds of I/O benchmark took (dd, iozone, 
> iometer etc.) and they all show there is a big gap between a 
> HVM domain with 
> PV driver and a PV guest domain. This is especially true for 
> large size I/O 
> packet (64k, 128k and 256k sequential I/O). So far, the disk 
> I/O performance 
> HVM w/ PV driver is only 20~30% of PV guests.
> 
What driver are you using, and in what OS?

20-30% is a lot better than the 10% that I've seen with the QEMU driver,
but I still expect better results than 30% on the PV driver... Not that
I have actually tested this, as I have other tasks. 

> Another thing I'm puzzled is disk I/O performance of PV 
> guests when tested 
> with small size packets (512B and 1K sequential I/O). 
> Although the PV guest 
> has  very close performance to native for large size I/O 
> packet, there is 
> still a clear gap between them for small size packets (512B 
> and 1K). I 
> really doubt if Xen hypervisor change the packet coalescing 
> behavior for 
> small size packets. Do you know if this is true?

Don't know. I wouldn't think so. 

I would think that the reason for small packets being more noticable is
that the overhead of the hypervisor becomes much more noticable than for
a large packet, as the overhead is (almost) constant for the
hypercall(s) involved, but the time used in the driver to actually
perform the disk IO is more dependant on the size of the packet. 

--
Mats
> 
> Liang
> 
> ----- Original Message ----- 
> From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
> To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>; "Mark Williamson" 
> <mark.williamson@xxxxxxxxxxxx>; <xen-users@xxxxxxxxxxxxxxxxxxx>
> Cc: "Tom Horsley" <tomhorsley@xxxxxxxxxxxx>; "Goswin von Brederlow" 
> <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>; "James Rivera" 
> <jrivera@xxxxxxxxxxx>
> Sent: Wednesday, January 17, 2007 3:07 AM
> Subject: RE: AIO for better disk IO? Re: [Xen-users] Getting 
> better Disk IO
> 
> 
> > -----Original Message-----
> > From: Liang Yang [mailto:multisyncfe991@xxxxxxxxxxx]
> > Sent: 16 January 2007 17:53
> > To: Petersson, Mats; Mark Williamson; xen-users@xxxxxxxxxxxxxxxxxxx
> > Cc: Tom Horsley; Goswin von Brederlow; James Rivera
> > Subject: AIO for better disk IO? Re: [Xen-users] Getting
> > better Disk IO
> >
> > Hi Mats,
> 
> Let me first say that I'm not an expert on AIO, but I did sit through
> the presentation of the new blktap driver at the Xen Summit. The
> following is to the best of my understanding, and could be 
> "codswallop"
> for all that I know... ;-)
> >
> > I once posted my questions about the behavior of Asynchronous
> > I/O under Xen
> > which is also
> > directly related disk I/O performance. however I did not get
> > any response. I
> > would appreciate
> > if you can advise about this.
> >
> > As AIO can help improve better performance and Linux kernel
> > keeps tuning the
> > AIO path,  more and more IOs can be expected to take AIO path
> > instead of
> > regular I/O path.
> >
> > First Question:
> > If we consider Xen, do we need to do AIO both in the 
> domain0 and guest
> > domains at the same? For example, considering two situations,
> > let a full
> > virtualized guest domain still do regular I/O and domain0
> > (vbd back end
> > driver) do AIO; or let both full-virtualized guest domain and
> > domain0 do
> > AIO. What is possible performance difference here?
> 
> The main benefit of AIO is that the current requestor (such as the VBD
> BackEnd driver) can continue doing other things whilst the 
> data is being
> read/written to/from the actual storage device. This in turn reduces
> latency where there are multiple requests outstanding from 
> the guest OS
> (for example multiple guests requesting "simultaneously" or multiple
> requests issued by the same guest close together).
> 
> The bandwidth difference all arises from the reduced latency, not
> because AIO is in itself better performing.
> 
> >
> > Second Question:
> > Does Domain0 always wait till AIO data is available and then
> > notify guest
> > domain? or Domain0 will issue an interrupt immediately to 
> notify guest
> > domain0 when AIO is queued? If the first case is true, then
> > all AIOs will
> > become synchronous.
> 
> The guest can not be issued with an interrupt to signify "data
> available" until the guest's data has been read, so for reads 
> at least,
> the effect from the guest's perspective is still synchronous. This
> doesn't mean that the guest can't issue further requests (for example
> from a different thread, or simply queuing multiple requests to the
> device) and gain from the fact that these requests can be 
> started before
> the first issued request is completed (from the backend 
> drivers point of
> view).
> 
> 
> >
> > Third Question:
> > Does Xen hypervisor change the behavior of Linux I/O
> > scheduler more or less?
> 
> Don't think so, but I'm by no means sure. In my view, the 
> modifications
> to the Linux kernel are meant to be "the minimum necessary".
> >
> > Four Question:
> > Will AIO have different performance impact on
> > para-virtualized domain and
> > full-virtualized domain respectively?
> 
> The main difference is the reduction in overhead 
> (particularly latency)
> in Dom0, which will affect both PV and HVM guests. HVM guests 
> have more
> "other things" happening in Dom0 (such as Qemu work), but it's hard to
> say which gains more from this without also qualifying what else is
> happening in the system. If you have PV drivers in a HVM domain, the
> disk performance should be about the same, whilst the 
> (flawed) benchmark
> of "hdparm" shows around 10x performance difference between Dom0 and a
> HVM guest - so we loose a lot in the process. I haven't tried the same
> with tap:aio: instead of file:, but I suspect the interaction between
> guest, hypervisor and qemu is a much larger component than 
> the tap:aio:
> vs file: method of disk access.
> 
> --
> Mats
> >
> > Thanks,
> >
> > Liang
> >
> > ----- Original Message ----- 
> > From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
> > To: "Mark Williamson" <mark.williamson@xxxxxxxxxxxx>;
> > <xen-users@xxxxxxxxxxxxxxxxxxx>
> > Cc: "Tom Horsley" <tomhorsley@xxxxxxxxxxxx>; "Goswin von Brederlow"
> > <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>; "James Rivera"
> > <jrivera@xxxxxxxxxxx>
> > Sent: Tuesday, January 16, 2007 10:22 AM
> > Subject: RE: [Xen-users] Getting better Disk IO
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> > > [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> > > Mark Williamson
> > > Sent: 16 January 2007 17:07
> > > To: xen-users@xxxxxxxxxxxxxxxxxxx
> > > Cc: Tom Horsley; Goswin von Brederlow; James Rivera
> > > Subject: Re: [Xen-users] Getting better Disk IO
> > >
> > > > I've been hoping to see replies to this, but lacking good
> > > information
> > > > here is the state of my confusion on virtual machine disks:
> > > >
> > > > If you read the docs for configuring disks on domu and
> > hvm machines,
> > > > you'll find a gazillion or so ways to present the disks to
> > > the virtual
> > > > machine.
> > >
> > > There are quite a lot of options, it's true ;-)
> > >
> > > > One of those ways (who's name I forget) provides (if I
> > > understand things,
> > > > which I doubt :-), provides a special kind of disk
> > > emulation designed to
> > > > be driven by special drivers on the virtual machine side.
> > > The combination
> > > > gives near direct disk access speeds in the virtual machine.
> > > >
> > > > The catch is that you need those drives for the kernel on
> > > the virtual
> > > > machine side. They may already exist, you may have to build
> > > them, and
> > > > depending on the kernel version, they may be hard to build.
> > > >
> > > > Perhaps someone who actually understands this could elaborate?
> > >
> > > Basically yes, that's all correct.
> > >
> > > To summarise:
> > >
> > > PV guests (that's paravirtualised, or Xen-native) use a
> > > Xen-aware block device
> > > that's optimised for good performance on Xen.
> > > HVM guests (Hardware Virtual Machine, fully virtualised and
> > > unaware of Xen)
> > > use an emulated IDE block device, provided by Xen (actually,
> > > it's provided by
> > > the qemu-based device models, running in dom0).
> > >
> > > The HVM emulated block device is not as optimised (nor does
> > > it lend itself to
> > > such effective optimisation) for high virtualised 
> performance as the
> > > Xen-aware device.  Therefore a second option is available for
> > > HVM guests: an
> > > implementation of the PV guest device driver that is able to
> > > "see through"
> > > the emulated hardware (in a secure and controlled way) and
> > > talked directly as
> > > a Xen-aware block device.  This can potentially give very
> > > good performance.
> >
> > The reason the emulated IDE controller is quite slow is a
> > consequence of
> > the emulation. The way it works is that the driver in the HVM domain
> > writes to the same IO ports that the real device would use.
> > These writes
> > are intercepted by the hardware support in the processor and
> > a VMEXIT is
> > issued to "exit the virtual machine" back into the 
> hypervisor. The HV
> > looks at the "exit reason", and sees that it's an IO WRITE 
> operation.
> > This operation is then encoded into a small packet and sent to QEMU.
> > QEMU processes this packet and responds back to HV to say "OK, done
> > that, you may continue". HV then does a VMRUN (or VMRESUME in
> > the Intel
> > case) to continue the guest execution, which is probably another IO
> > instruction to write to the IDE controller. There's a total
> > of 5-6 bytes
> > written to the IDE controller per transaction, and whilst
> > it's possible
> > to combine some of these writes into a single write, it's not always
> > done that way. Once all writes for one transaction are 
> completed, the
> > QEMU ide emulation code will perform the requested 
> operation (such as
> > reading or writing a sector). When that is complete, a
> > virtual interrupt
> > is issued to the guest, and the guest will see this as a "disk done"
> > interrupt, just like real hardware.
> >
> > All these steps of IO intercepts takes several thousand
> > cycles, which is
> > a bit longer than a regular IO write operation would take 
> on the real
> > hardware, and the system will still need to issue the real IO
> > operations
> > to perform the REAL hardware read/write corresponding to the virtual
> > disk (such as reading a file, LVM or physical partition) at
> > some point,
> > so this is IN ADDITION to the time used by the hypervisor.
> >
> > Unfortunately, the only possible improvement on this scenario is the
> > type "virtual-aware" driver that is described below.
> >
> > [Using a slightly more efficient model than IDE may also help, but
> > that's going to be marginal compared to the benefits of using a
> > virtual-aware driver].
> >
> > --
> > Mats
> > >
> > > I don't know if these drivers are included in any Linux
> > > distributions yet, but
> > > they are available in the Xen source tree so that you can
> > > build your own, in
> > > principle.  Windows versions of the drivers are included in
> > > XenSource's
> > > products, I believe - including the free (as in beer)
> > > XenExpress platform.
> > >
> > > There are potentially other options being developed,
> > > including an emulated
> > > SCSI device that should improve the potential for higher
> > > performance IO
> > > emulation without Xen-aware drivers.
> > >
> > > Hope that clarifies things!
> > >
> > > Cheers,
> > > Mark
> > >
> > > -- 
> > > Dave: Just a question. What use is a unicyle with no seat?
> > > And no pedals!
> > > Mark: To answer a question with a question: What use is a
> > skateboard?
> > > Dave: Skateboards have wheels.
> > > Mark: My wheel has a wheel!
> > >
> > > _______________________________________________
> > > Xen-users mailing list
> > > Xen-users@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-users
> > >
> > >
> > >
> >
> >
> >
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-users
> >
> >
> >
> >
> 
> 
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel