WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: AIO for better disk IO? Re: [Xen-users] Getting better Disk IO

To: "Petersson, Mats" <Mats.Petersson@xxxxxxx>, "Mark Williamson" <mark.williamson@xxxxxxxxxxxx>, <xen-users@xxxxxxxxxxxxxxxxxxx>, "Xen-Devel" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: AIO for better disk IO? Re: [Xen-users] Getting better Disk IO
From: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>
Date: Thu, 18 Jan 2007 10:01:07 -0700
Cc: Tom Horsley <tomhorsley@xxxxxxxxxxxx>, Goswin von Brederlow <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>, James Rivera <jrivera@xxxxxxxxxxx>
Delivery-date: Thu, 18 Jan 2007 09:01:02 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <907625E08839C4409CE5768403633E0B018E1896@xxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
I'm using 8 Maxtor Atlas SAS II drives and the OS I'm using is Red Hat Enterprise Linux 4U4. Both JBOD, MD-RAID0 and MD-RAID5 shows the consistent performance gap.

Liang

----- Original Message ----- From: "Petersson, Mats" <Mats.Petersson@xxxxxxx> To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>; "Mark Williamson" <mark.williamson@xxxxxxxxxxxx>; <xen-users@xxxxxxxxxxxxxxxxxxx>; "Xen-Devel" <xen-devel@xxxxxxxxxxxxxxxxxxx> Cc: "Tom Horsley" <tomhorsley@xxxxxxxxxxxx>; "Goswin von Brederlow" <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>; "James Rivera" <jrivera@xxxxxxxxxxx>
Sent: Thursday, January 18, 2007 2:50 AM
Subject: RE: AIO for better disk IO? Re: [Xen-users] Getting better Disk IO




-----Original Message-----
From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Liang Yang
Sent: 17 January 2007 16:44
To: Petersson, Mats; Mark Williamson;
xen-users@xxxxxxxxxxxxxxxxxxx; Xen-Devel
Cc: Tom Horsley; Goswin von Brederlow; James Rivera
Subject: Re: AIO for better disk IO? Re: [Xen-users] Getting
better Disk IO

Hi Mats,

Thanks for your reply.

You said the HVM domain using PV driver should have the same disk I/O
performance as PV guests. However, based on my experiments,
this is not
true. I have tried different kinds of I/O benchmark took (dd, iozone,
iometer etc.) and they all show there is a big gap between a
HVM domain with
PV driver and a PV guest domain. This is especially true for
large size I/O
packet (64k, 128k and 256k sequential I/O). So far, the disk
I/O performance
HVM w/ PV driver is only 20~30% of PV guests.

What driver are you using, and in what OS?

20-30% is a lot better than the 10% that I've seen with the QEMU driver,
but I still expect better results than 30% on the PV driver... Not that
I have actually tested this, as I have other tasks.

Another thing I'm puzzled is disk I/O performance of PV
guests when tested
with small size packets (512B and 1K sequential I/O).
Although the PV guest
has  very close performance to native for large size I/O
packet, there is
still a clear gap between them for small size packets (512B
and 1K). I
really doubt if Xen hypervisor change the packet coalescing
behavior for
small size packets. Do you know if this is true?

Don't know. I wouldn't think so.

I would think that the reason for small packets being more noticable is
that the overhead of the hypervisor becomes much more noticable than for
a large packet, as the overhead is (almost) constant for the
hypercall(s) involved, but the time used in the driver to actually
perform the disk IO is more dependant on the size of the packet.

--
Mats

Liang

----- Original Message ----- From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>; "Mark Williamson"
<mark.williamson@xxxxxxxxxxxx>; <xen-users@xxxxxxxxxxxxxxxxxxx>
Cc: "Tom Horsley" <tomhorsley@xxxxxxxxxxxx>; "Goswin von Brederlow"
<brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>; "James Rivera"
<jrivera@xxxxxxxxxxx>
Sent: Wednesday, January 17, 2007 3:07 AM
Subject: RE: AIO for better disk IO? Re: [Xen-users] Getting
better Disk IO


> -----Original Message-----
> From: Liang Yang [mailto:multisyncfe991@xxxxxxxxxxx]
> Sent: 16 January 2007 17:53
> To: Petersson, Mats; Mark Williamson; xen-users@xxxxxxxxxxxxxxxxxxx
> Cc: Tom Horsley; Goswin von Brederlow; James Rivera
> Subject: AIO for better disk IO? Re: [Xen-users] Getting
> better Disk IO
>
> Hi Mats,

Let me first say that I'm not an expert on AIO, but I did sit through
the presentation of the new blktap driver at the Xen Summit. The
following is to the best of my understanding, and could be
"codswallop"
for all that I know... ;-)
>
> I once posted my questions about the behavior of Asynchronous
> I/O under Xen
> which is also
> directly related disk I/O performance. however I did not get
> any response. I
> would appreciate
> if you can advise about this.
>
> As AIO can help improve better performance and Linux kernel
> keeps tuning the
> AIO path,  more and more IOs can be expected to take AIO path
> instead of
> regular I/O path.
>
> First Question:
> If we consider Xen, do we need to do AIO both in the
domain0 and guest
> domains at the same? For example, considering two situations,
> let a full
> virtualized guest domain still do regular I/O and domain0
> (vbd back end
> driver) do AIO; or let both full-virtualized guest domain and
> domain0 do
> AIO. What is possible performance difference here?

The main benefit of AIO is that the current requestor (such as the VBD
BackEnd driver) can continue doing other things whilst the
data is being
read/written to/from the actual storage device. This in turn reduces
latency where there are multiple requests outstanding from
the guest OS
(for example multiple guests requesting "simultaneously" or multiple
requests issued by the same guest close together).

The bandwidth difference all arises from the reduced latency, not
because AIO is in itself better performing.

>
> Second Question:
> Does Domain0 always wait till AIO data is available and then
> notify guest
> domain? or Domain0 will issue an interrupt immediately to
notify guest
> domain0 when AIO is queued? If the first case is true, then
> all AIOs will
> become synchronous.

The guest can not be issued with an interrupt to signify "data
available" until the guest's data has been read, so for reads
at least,
the effect from the guest's perspective is still synchronous. This
doesn't mean that the guest can't issue further requests (for example
from a different thread, or simply queuing multiple requests to the
device) and gain from the fact that these requests can be
started before
the first issued request is completed (from the backend
drivers point of
view).


>
> Third Question:
> Does Xen hypervisor change the behavior of Linux I/O
> scheduler more or less?

Don't think so, but I'm by no means sure. In my view, the
modifications
to the Linux kernel are meant to be "the minimum necessary".
>
> Four Question:
> Will AIO have different performance impact on
> para-virtualized domain and
> full-virtualized domain respectively?

The main difference is the reduction in overhead
(particularly latency)
in Dom0, which will affect both PV and HVM guests. HVM guests
have more
"other things" happening in Dom0 (such as Qemu work), but it's hard to
say which gains more from this without also qualifying what else is
happening in the system. If you have PV drivers in a HVM domain, the
disk performance should be about the same, whilst the
(flawed) benchmark
of "hdparm" shows around 10x performance difference between Dom0 and a
HVM guest - so we loose a lot in the process. I haven't tried the same
with tap:aio: instead of file:, but I suspect the interaction between
guest, hypervisor and qemu is a much larger component than
the tap:aio:
vs file: method of disk access.

--
Mats
>
> Thanks,
>
> Liang
>
> ----- Original Message ----- > From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
> To: "Mark Williamson" <mark.williamson@xxxxxxxxxxxx>;
> <xen-users@xxxxxxxxxxxxxxxxxxx>
> Cc: "Tom Horsley" <tomhorsley@xxxxxxxxxxxx>; "Goswin von Brederlow"
> <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>; "James Rivera"
> <jrivera@xxxxxxxxxxx>
> Sent: Tuesday, January 16, 2007 10:22 AM
> Subject: RE: [Xen-users] Getting better Disk IO
>
>
>
>
> > -----Original Message-----
> > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> > [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> > Mark Williamson
> > Sent: 16 January 2007 17:07
> > To: xen-users@xxxxxxxxxxxxxxxxxxx
> > Cc: Tom Horsley; Goswin von Brederlow; James Rivera
> > Subject: Re: [Xen-users] Getting better Disk IO
> >
> > > I've been hoping to see replies to this, but lacking good
> > information
> > > here is the state of my confusion on virtual machine disks:
> > >
> > > If you read the docs for configuring disks on domu and
> hvm machines,
> > > you'll find a gazillion or so ways to present the disks to
> > the virtual
> > > machine.
> >
> > There are quite a lot of options, it's true ;-)
> >
> > > One of those ways (who's name I forget) provides (if I
> > understand things,
> > > which I doubt :-), provides a special kind of disk
> > emulation designed to
> > > be driven by special drivers on the virtual machine side.
> > The combination
> > > gives near direct disk access speeds in the virtual machine.
> > >
> > > The catch is that you need those drives for the kernel on
> > the virtual
> > > machine side. They may already exist, you may have to build
> > them, and
> > > depending on the kernel version, they may be hard to build.
> > >
> > > Perhaps someone who actually understands this could elaborate?
> >
> > Basically yes, that's all correct.
> >
> > To summarise:
> >
> > PV guests (that's paravirtualised, or Xen-native) use a
> > Xen-aware block device
> > that's optimised for good performance on Xen.
> > HVM guests (Hardware Virtual Machine, fully virtualised and
> > unaware of Xen)
> > use an emulated IDE block device, provided by Xen (actually,
> > it's provided by
> > the qemu-based device models, running in dom0).
> >
> > The HVM emulated block device is not as optimised (nor does
> > it lend itself to
> > such effective optimisation) for high virtualised
performance as the
> > Xen-aware device.  Therefore a second option is available for
> > HVM guests: an
> > implementation of the PV guest device driver that is able to
> > "see through"
> > the emulated hardware (in a secure and controlled way) and
> > talked directly as
> > a Xen-aware block device.  This can potentially give very
> > good performance.
>
> The reason the emulated IDE controller is quite slow is a
> consequence of
> the emulation. The way it works is that the driver in the HVM domain
> writes to the same IO ports that the real device would use.
> These writes
> are intercepted by the hardware support in the processor and
> a VMEXIT is
> issued to "exit the virtual machine" back into the
hypervisor. The HV
> looks at the "exit reason", and sees that it's an IO WRITE
operation.
> This operation is then encoded into a small packet and sent to QEMU.
> QEMU processes this packet and responds back to HV to say "OK, done
> that, you may continue". HV then does a VMRUN (or VMRESUME in
> the Intel
> case) to continue the guest execution, which is probably another IO
> instruction to write to the IDE controller. There's a total
> of 5-6 bytes
> written to the IDE controller per transaction, and whilst
> it's possible
> to combine some of these writes into a single write, it's not always
> done that way. Once all writes for one transaction are
completed, the
> QEMU ide emulation code will perform the requested
operation (such as
> reading or writing a sector). When that is complete, a
> virtual interrupt
> is issued to the guest, and the guest will see this as a "disk done"
> interrupt, just like real hardware.
>
> All these steps of IO intercepts takes several thousand
> cycles, which is
> a bit longer than a regular IO write operation would take
on the real
> hardware, and the system will still need to issue the real IO
> operations
> to perform the REAL hardware read/write corresponding to the virtual
> disk (such as reading a file, LVM or physical partition) at
> some point,
> so this is IN ADDITION to the time used by the hypervisor.
>
> Unfortunately, the only possible improvement on this scenario is the
> type "virtual-aware" driver that is described below.
>
> [Using a slightly more efficient model than IDE may also help, but
> that's going to be marginal compared to the benefits of using a
> virtual-aware driver].
>
> --
> Mats
> >
> > I don't know if these drivers are included in any Linux
> > distributions yet, but
> > they are available in the Xen source tree so that you can
> > build your own, in
> > principle.  Windows versions of the drivers are included in
> > XenSource's
> > products, I believe - including the free (as in beer)
> > XenExpress platform.
> >
> > There are potentially other options being developed,
> > including an emulated
> > SCSI device that should improve the potential for higher
> > performance IO
> > emulation without Xen-aware drivers.
> >
> > Hope that clarifies things!
> >
> > Cheers,
> > Mark
> >
> > -- > > Dave: Just a question. What use is a unicyle with no seat?
> > And no pedals!
> > Mark: To answer a question with a question: What use is a
> skateboard?
> > Dave: Skateboards have wheels.
> > Mark: My wheel has a wheel!
> >
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-users
> >
> >
> >
>
>
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users
>
>
>
>




_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users






_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users