[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] DomU vs Dom0 performance.



On Thu, Oct 03, 2013 at 02:50:27PM -0400, sushrut shirole wrote:
> Hi Konrad,
> 
> Thank you for the simple and wonderful explanation. Now I understand why
> the syscall micro-benchmark performs better on domU
> than the dom0. But I am still confused about 'memory bandwidth'
> micro-benchmark performance. Memory bandwidth micro-benchmark
> test will cause a page fault when the page is accessed for the first time.
> I presume the PTE updates is the major reason for the
> performance degradation of the dom0. But after first few page faults, all

Correct. Each PTE update at worst requires a hypercall. We do have batching
which means you can batch up to 32 PTE updates in one hypercall. But 
if you mix the PTE updates with mprotect, etc, it gets worst.

> the pages would be in the memory (Both dom0 and domU
> have 4096M of memory and micro-benchmark uses < test_size * 3 i.e. 1000M *
> 3 in this case), then why does there is considerable
> amount of performance difference ?

I don't know what the micro-benchmark does. Does it use mprotect and any
page manipulations?

> 
> Thank you,
> Sushrut.
> 
> 
> 
> On 1 October 2013 10:24, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>wrote:
> 
> > On Tue, Oct 01, 2013 at 12:55:18PM +0000, sushrut shirole wrote:
> > > Please find my response inline.
> > >
> > > Thank you,
> > > Sushrut.
> > >
> > > On 1 October 2013 10:05, Felipe Franciosi <felipe.franciosi@xxxxxxxxxx
> > >wrote:
> > >
> > > >  1) Can you paste your entire config file here?****
> > > >
> > > > This is just for clarification on the HVM bit.****
> > > >
> > > > Your âdiskâ config suggests you are using the PV protocol for storage
> > > > (blkback).
> > > >
> > > > kernel = "hvmloader"
> > > builder='hvm'
> > > memory = 4096
> > > name = "ArchHVM"
> > > vcpus=8
> > > disk = [ 'phy:/dev/sda5,hda,w',
> > > 'file:/root/dev/iso/archlinux.iso,hdc:cdrom,r' ]
> > > device_model = 'qemu-dm'
> > > boot="c"
> > > sdl=0
> > > xen_platform_pci=1
> > > opengl=0
> > > vnc=0
> > > vncpasswd=''
> > > nographic=1
> > > stdvga=0
> > > serial='pty'
> > >
> > >
> > > > 2) Also, can you run âuname -a" in both dom0 and domU and paste it
> > here as
> > > > well?****
> > > >
> > > >      Based on the syscall latencies you presented, it sounds like one
> > > > domain may be 32bit and the other 64bit.****
> > > >
> > > > **
> > > >
> > > kernel information on dom0 is :
> > > Linux localhost 3.5.0-IDD #5 SMP PREEMPT Fri Sep 6 23:31:56 UTC 2013
> > x86_64
> > > GNU/Linux
> > >
> > > on domU is :
> > > Linux domu 3.5.0-IDD-12913 #2 SMP PREEMPT Sun Dec 9 17:54:30 EST 2012
> > > x86_64 GNU/Linux
> > >
> > > 3) You are doing this:****
> > > >
> > > > ** **
> > > >
> > > > > <snip>
> > > > > for i in `ls test_file.*`
> > > > > do
> > > > >    sudo dd if=./$i of=/dev/zero
> > > > > done
> > > > > </snip>
> > > >
> > > > My bad. I have changed it to /dev/null.
> > >
> > > ****
> > > >
> > > > I donât know what you intended with this, but you canât output to
> > > > /dev/zero (you can read from /dev/zero, but you can only output to
> > > > /dev/null).****
> > > >
> > > > If your âimgâ is 5G and your guest has 4G of RAM, you will not
> > > > consistently buffer the entire image.****
> > > >
> > > > **
> > > >
> > > Even though I am using a 5G of img, read operations executed are of size
> > 1G
> > > only. Also lm_benchmark doesn't involve any read/writes to this ".img",
> > > still the results I am getting are better on domU when measured with lm
> > > micro benchmarks.
> > >
> > > > **
> > > >
> > > > You are then doing buffered IO (note that some of your requests are
> > > > completing in 10us). That can only happen if you are reading from
> > memory
> > > > and not from disk.
> > > >
> > > Even though a single request is completing in 10us, total time required
> > to
> > > complete all requests (5000000) is 17 & 13 seconds for dom0 and domU
> > > respectively.
> > >
> > > (I forgot to mention that I have a SSD installed on this machine)
> > >
> > > > **
> > > >
> > > > If you want to consistently compare the performance between two
> > domains,
> > > > you should always bypass the VMâs cache with O_DIRECT.****
> > > >
> > > > **
> > > >
> > > But looking at results of lat_syscall and bw_mem microbenchmarks, it
> > shows
> > > that syscalls are executed faster in domU and memory bandwidth is more in
> > > domU.
> >
> > Yes. That is expected with HVM guests. Their syscall overhead and also
> > memory
> > bandwith will be faster than PV guests (which is what dom0 is).
> >
> > That is why PVH is such an intersting future direction - it is PV with HVM
> > containers to lower the syscall overhead and memory page table operations.
> >
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.