[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: poor domU VBD performance.




I am sorry to return to this issue after quite a long interruption. 
As I mentioned in a post before, I came accross this problem when I
was testing file-system performance. After the problems with raw sequential
I/O seemed to have been fixed in the testing release, I turned back to
my original problem. 
I did a simple test that dispite its simplicity seems to put the IO subsystem
under considerable stress. I took the /usr tree of my system and copied 
five it times into different directories on a slice of disk 1. This tree con-
sistst of 36000 files with about 750 MB of data. Then I started 
to copy each of these copies recursively onto disk 2 ( each to its own 
location on that disk, of course ). I ran these copying
in parallel and the processes took about 6 to 7 minutes in DOM0, while they
needed between 14.6 and 15.9 minutes in DOMU. 

Essentially, this means that using this heavy io load on the system I get 
back to my 40% ratio between io performance on DOMU compared and io perfor-
mance on DOM0 that I initially reported. This may just be coincidence, but
probably it is worth mention. 

I monitored the disk and block-io activity with iostat. The output of
both is too large to post it here, so I will only try to include a few 
representative lines of each. The first two lines show the activity while
doing the copying on DOMU.

This is a snapshot of a phase with relatively high throughput (DOMU):


Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 2748.00  1.60 71.20   12.80 22561.60     6.40 11280.80   
310.09     1.78   23.96   4.73  34.40
hdg        2571.00   5.00 126.80  9.60 21580.80  115.20 10790.40    57.60   
159.06     5.48   40.38   6.61  90.20

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    6.20    0.20   93.40


this is a snapshot of a phase with relatively low throughput (DOMU):


Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 676.40  0.00 33.00    0.00 5678.40     0.00  2839.20   
172.07     1.76   53.45   4.91  16.20
hdg        335.80  11.00 315.00  3.40 5206.40  115.20  2603.20    57.60    
16.71     4.15   13.02   2.76  87.80

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    9.00    0.00   90.80


_I suspect, that the reported iowait on cpu-usage is not entirely correct, but
I am not sure about it.

The next two lines are snapshots of iostat output during the copying in DOM0

again the first snapshot was taken in a phase of relative high throughput
(DOM0):

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 5845.40  1.40 110.20   11.20 47812.80     5.60 23906.40   
428.53   105.96  772.63   8.96 100.00
hdg         46.20  24.80 389.80  2.20 47628.80  216.00 23814.40   108.00   
122.05     7.12   18.23   3.30 129.40

avg-cpu:  %user   %nice %system %iowait   %idle
           2.40    0.00   40.20   57.40    0.00

the next line was taken in a phase of relatively low throughput (DOM0):

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 903.40  0.20 106.80    3.20 7972.80     1.60  3986.40    
74.54    20.77  217.91   4.06  43.40
hdg          0.00  24.00 746.60  1.20 9302.40  200.00  4651.20   100.00    
12.71     4.96    6.67   1.34 100.00

avg-cpu:  %user   %nice %system %iowait   %idle
           3.40    0.00   44.00   52.60    0.00

The problem seems to be the reading. The device hde, which contains the slice 
where the data is copied onto is almost never really busy when using DOMU.  
The ratio of kb/s written and usage seems to reflect that writing from DOMU
is just as efficient as writing from DOM0 ( writing can be buffered in 
both cases after all ). 
Yet the information on reading seems to show a different picture. Blockio
merges requests permanently resulting in request sizes that are approxi-
mately equal in both cases. Yet service times for DOMU requests are about
twice the time needed for requests for DOM0.

I do not know if such a scenario is simply inadequate for virtual systems at
least under Xen. We are thinking about running a mail gateway on top of a 
protected and secured dom0 system, and potentially offering other network
services in separate domains. We want to avoid corruption of DOM0 
while being able to offer "insecure" services in nonprivileged domains.
We know that mail servicing can potentially
put an intense load onto the filesystem - admittedly more on inodes ( create
and delete ) than with respect to data throughput.

Do I simply have to accept that under heavy io load domains using vbd to 
access storage devices will lag behind dom0 and native linux systems, or is
there a chance to fix this ?


My reported test was done on a fujitsu-siemens system RX100 with a 2.0 Ghz
Celeron CPU and a total of only 256 MB of memory. DOM0 had 128 MB and DOMU 
100 MB. The disks were simply ide disks. I did the same test on a System 
with 1.25 GB Ram with both domains having 0.5 GB of memory. It contains SATA
disks and the results are essentially the same the only difference is that both
processes are slower due to less throughput under random access from the disks.

Any advice ore help ?

Thanks in advance 

    Peter 
 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.