WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: AIO for better disk IO? Re: [Xen-users] Getting better Disk IO

To: "Petersson, Mats" <Mats.Petersson@xxxxxxx>, "Mark Williamson" <mark.williamson@xxxxxxxxxxxx>, <xen-users@xxxxxxxxxxxxxxxxxxx>, "Xen-Devel" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: AIO for better disk IO? Re: [Xen-users] Getting better Disk IO
From: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>
Date: Wed, 17 Jan 2007 09:44:13 -0700
Cc: Tom Horsley <tomhorsley@xxxxxxxxxxxx>, Goswin von Brederlow <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>, James Rivera <jrivera@xxxxxxxxxxx>
Delivery-date: Wed, 17 Jan 2007 08:44:17 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <907625E08839C4409CE5768403633E0B018E1889@xxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
Hi Mats,

Thanks for your reply.

You said the HVM domain using PV driver should have the same disk I/O performance as PV guests. However, based on my experiments, this is not true. I have tried different kinds of I/O benchmark took (dd, iozone, iometer etc.) and they all show there is a big gap between a HVM domain with PV driver and a PV guest domain. This is especially true for large size I/O packet (64k, 128k and 256k sequential I/O). So far, the disk I/O performance HVM w/ PV driver is only 20~30% of PV guests.

Another thing I'm puzzled is disk I/O performance of PV guests when tested with small size packets (512B and 1K sequential I/O). Although the PV guest has very close performance to native for large size I/O packet, there is still a clear gap between them for small size packets (512B and 1K). I really doubt if Xen hypervisor change the packet coalescing behavior for small size packets. Do you know if this is true?

Liang

----- Original Message ----- From: "Petersson, Mats" <Mats.Petersson@xxxxxxx> To: "Liang Yang" <multisyncfe991@xxxxxxxxxxx>; "Mark Williamson" <mark.williamson@xxxxxxxxxxxx>; <xen-users@xxxxxxxxxxxxxxxxxxx> Cc: "Tom Horsley" <tomhorsley@xxxxxxxxxxxx>; "Goswin von Brederlow" <brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>; "James Rivera" <jrivera@xxxxxxxxxxx>
Sent: Wednesday, January 17, 2007 3:07 AM
Subject: RE: AIO for better disk IO? Re: [Xen-users] Getting better Disk IO


-----Original Message-----
From: Liang Yang [mailto:multisyncfe991@xxxxxxxxxxx]
Sent: 16 January 2007 17:53
To: Petersson, Mats; Mark Williamson; xen-users@xxxxxxxxxxxxxxxxxxx
Cc: Tom Horsley; Goswin von Brederlow; James Rivera
Subject: AIO for better disk IO? Re: [Xen-users] Getting
better Disk IO

Hi Mats,

Let me first say that I'm not an expert on AIO, but I did sit through
the presentation of the new blktap driver at the Xen Summit. The
following is to the best of my understanding, and could be "codswallop"
for all that I know... ;-)

I once posted my questions about the behavior of Asynchronous
I/O under Xen
which is also
directly related disk I/O performance. however I did not get
any response. I
would appreciate
if you can advise about this.

As AIO can help improve better performance and Linux kernel
keeps tuning the
AIO path,  more and more IOs can be expected to take AIO path
instead of
regular I/O path.

First Question:
If we consider Xen, do we need to do AIO both in the domain0 and guest
domains at the same? For example, considering two situations,
let a full
virtualized guest domain still do regular I/O and domain0
(vbd back end
driver) do AIO; or let both full-virtualized guest domain and
domain0 do
AIO. What is possible performance difference here?

The main benefit of AIO is that the current requestor (such as the VBD
BackEnd driver) can continue doing other things whilst the data is being
read/written to/from the actual storage device. This in turn reduces
latency where there are multiple requests outstanding from the guest OS
(for example multiple guests requesting "simultaneously" or multiple
requests issued by the same guest close together).

The bandwidth difference all arises from the reduced latency, not
because AIO is in itself better performing.


Second Question:
Does Domain0 always wait till AIO data is available and then
notify guest
domain? or Domain0 will issue an interrupt immediately to notify guest
domain0 when AIO is queued? If the first case is true, then
all AIOs will
become synchronous.

The guest can not be issued with an interrupt to signify "data
available" until the guest's data has been read, so for reads at least,
the effect from the guest's perspective is still synchronous. This
doesn't mean that the guest can't issue further requests (for example
from a different thread, or simply queuing multiple requests to the
device) and gain from the fact that these requests can be started before
the first issued request is completed (from the backend drivers point of
view).



Third Question:
Does Xen hypervisor change the behavior of Linux I/O
scheduler more or less?

Don't think so, but I'm by no means sure. In my view, the modifications
to the Linux kernel are meant to be "the minimum necessary".

Four Question:
Will AIO have different performance impact on
para-virtualized domain and
full-virtualized domain respectively?

The main difference is the reduction in overhead (particularly latency)
in Dom0, which will affect both PV and HVM guests. HVM guests have more
"other things" happening in Dom0 (such as Qemu work), but it's hard to
say which gains more from this without also qualifying what else is
happening in the system. If you have PV drivers in a HVM domain, the
disk performance should be about the same, whilst the (flawed) benchmark
of "hdparm" shows around 10x performance difference between Dom0 and a
HVM guest - so we loose a lot in the process. I haven't tried the same
with tap:aio: instead of file:, but I suspect the interaction between
guest, hypervisor and qemu is a much larger component than the tap:aio:
vs file: method of disk access.

--
Mats

Thanks,

Liang

----- Original Message ----- From: "Petersson, Mats" <Mats.Petersson@xxxxxxx>
To: "Mark Williamson" <mark.williamson@xxxxxxxxxxxx>;
<xen-users@xxxxxxxxxxxxxxxxxxx>
Cc: "Tom Horsley" <tomhorsley@xxxxxxxxxxxx>; "Goswin von Brederlow"
<brederlo@xxxxxxxxxxxxxxxxxxxxxxxxxxx>; "James Rivera"
<jrivera@xxxxxxxxxxx>
Sent: Tuesday, January 16, 2007 10:22 AM
Subject: RE: [Xen-users] Getting better Disk IO




> -----Original Message-----
> From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> Mark Williamson
> Sent: 16 January 2007 17:07
> To: xen-users@xxxxxxxxxxxxxxxxxxx
> Cc: Tom Horsley; Goswin von Brederlow; James Rivera
> Subject: Re: [Xen-users] Getting better Disk IO
>
> > I've been hoping to see replies to this, but lacking good
> information
> > here is the state of my confusion on virtual machine disks:
> >
> > If you read the docs for configuring disks on domu and
hvm machines,
> > you'll find a gazillion or so ways to present the disks to
> the virtual
> > machine.
>
> There are quite a lot of options, it's true ;-)
>
> > One of those ways (who's name I forget) provides (if I
> understand things,
> > which I doubt :-), provides a special kind of disk
> emulation designed to
> > be driven by special drivers on the virtual machine side.
> The combination
> > gives near direct disk access speeds in the virtual machine.
> >
> > The catch is that you need those drives for the kernel on
> the virtual
> > machine side. They may already exist, you may have to build
> them, and
> > depending on the kernel version, they may be hard to build.
> >
> > Perhaps someone who actually understands this could elaborate?
>
> Basically yes, that's all correct.
>
> To summarise:
>
> PV guests (that's paravirtualised, or Xen-native) use a
> Xen-aware block device
> that's optimised for good performance on Xen.
> HVM guests (Hardware Virtual Machine, fully virtualised and
> unaware of Xen)
> use an emulated IDE block device, provided by Xen (actually,
> it's provided by
> the qemu-based device models, running in dom0).
>
> The HVM emulated block device is not as optimised (nor does
> it lend itself to
> such effective optimisation) for high virtualised performance as the
> Xen-aware device.  Therefore a second option is available for
> HVM guests: an
> implementation of the PV guest device driver that is able to
> "see through"
> the emulated hardware (in a secure and controlled way) and
> talked directly as
> a Xen-aware block device.  This can potentially give very
> good performance.

The reason the emulated IDE controller is quite slow is a
consequence of
the emulation. The way it works is that the driver in the HVM domain
writes to the same IO ports that the real device would use.
These writes
are intercepted by the hardware support in the processor and
a VMEXIT is
issued to "exit the virtual machine" back into the hypervisor. The HV
looks at the "exit reason", and sees that it's an IO WRITE operation.
This operation is then encoded into a small packet and sent to QEMU.
QEMU processes this packet and responds back to HV to say "OK, done
that, you may continue". HV then does a VMRUN (or VMRESUME in
the Intel
case) to continue the guest execution, which is probably another IO
instruction to write to the IDE controller. There's a total
of 5-6 bytes
written to the IDE controller per transaction, and whilst
it's possible
to combine some of these writes into a single write, it's not always
done that way. Once all writes for one transaction are completed, the
QEMU ide emulation code will perform the requested operation (such as
reading or writing a sector). When that is complete, a
virtual interrupt
is issued to the guest, and the guest will see this as a "disk done"
interrupt, just like real hardware.

All these steps of IO intercepts takes several thousand
cycles, which is
a bit longer than a regular IO write operation would take on the real
hardware, and the system will still need to issue the real IO
operations
to perform the REAL hardware read/write corresponding to the virtual
disk (such as reading a file, LVM or physical partition) at
some point,
so this is IN ADDITION to the time used by the hypervisor.

Unfortunately, the only possible improvement on this scenario is the
type "virtual-aware" driver that is described below.

[Using a slightly more efficient model than IDE may also help, but
that's going to be marginal compared to the benefits of using a
virtual-aware driver].

--
Mats
>
> I don't know if these drivers are included in any Linux
> distributions yet, but
> they are available in the Xen source tree so that you can
> build your own, in
> principle.  Windows versions of the drivers are included in
> XenSource's
> products, I believe - including the free (as in beer)
> XenExpress platform.
>
> There are potentially other options being developed,
> including an emulated
> SCSI device that should improve the potential for higher
> performance IO
> emulation without Xen-aware drivers.
>
> Hope that clarifies things!
>
> Cheers,
> Mark
>
> -- > Dave: Just a question. What use is a unicyle with no seat?
> And no pedals!
> Mark: To answer a question with a question: What use is a
skateboard?
> Dave: Skateboards have wheels.
> Mark: My wheel has a wheel!
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users
>
>
>



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users








_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users