[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] IO speed limited by size of IO request (for RBD driver)




On 22 May 2013, at 21:13, "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx> 
wrote:

> On Wed, May 08, 2013 at 11:14:26AM +0000, Felipe Franciosi wrote:
>> However we didn't "prove" it properly, I think it is worth mentioning that 
>> this boils down to what we originally thought it was:
>> Steven's environment is writing to a filesystem in the guest. On top of 
>> that, it's using the guest's buffer cache to do the writes.
> 
> If he is using O_DIRECT it bypasses the cache in the guest.

Certainly, but the issues were when _not_ using O_DIRECT.

F


> 
>> This means that we cannot (easily?) control how the cache and the fs are 
>> flushing these writes through blkfront/blkback.
>> 
>> In other words, it's very likely that it generates a workload that simply 
>> doesn't perform well on the "stock" PV protocol.
>> This is a good example of how indirect descriptors help (remembering Roger 
>> and I were struggling to find use cases where indirect descriptors showed a 
>> substantial gain).
>> 
>> Cheers,
>> Felipe
>> 
>> -----Original Message-----
>> From: Roger Pau Monne 
>> Sent: 08 May 2013 11:45
>> To: Steven Haigh
>> Cc: Felipe Franciosi; xen-devel@xxxxxxxxxxxxx
>> Subject: Re: IO speed limited by size of IO request (for RBD driver)
>> 
>> On 08/05/13 12:32, Steven Haigh wrote:
>>> On 8/05/2013 6:33 PM, Roger Pau Monné wrote:
>>>> On 08/05/13 10:20, Steven Haigh wrote:
>>>>> On 30/04/2013 8:07 PM, Felipe Franciosi wrote:
>>>>>> I noticed you copied your results from "dd", but I didn't see any 
>>>>>> conclusions drawn from experiment.
>>>>>> 
>>>>>> Did I understand it wrong or now you have comparable performance on dom0 
>>>>>> and domU when using DIRECT?
>>>>>> 
>>>>>> domU:
>>>>>> # dd if=/dev/zero of=output.zero bs=1M count=2048 oflag=direct
>>>>>> 2048+0 records in
>>>>>> 2048+0 records out
>>>>>> 2147483648 bytes (2.1 GB) copied, 25.4705 s, 84.3 MB/s
>>>>>> 
>>>>>> dom0:
>>>>>> # dd if=/dev/zero of=output.zero bs=1M count=2048 oflag=direct
>>>>>> 2048+0 records in
>>>>>> 2048+0 records out
>>>>>> 2147483648 bytes (2.1 GB) copied, 24.8914 s, 86.3 MB/s
>>>>>> 
>>>>>> 
>>>>>> I think that if the performance differs when NOT using DIRECT, the issue 
>>>>>> must be related to the way your guest is flushing the cache. This must 
>>>>>> be generating a workload that doesn't perform well on Xen's PV protocol.
>>>>> 
>>>>> Just wondering if there is any further input on this... While DIRECT 
>>>>> writes are as good as can be expected, NON-DIRECT writes in certain 
>>>>> cases (specifically with a mdadm raid in the Dom0) are affected by 
>>>>> about a 50% loss in throughput...
>>>>> 
>>>>> The hard part is that this is the default mode of writing!
>>>> 
>>>> As another test with indirect descriptors, could you change 
>>>> xen_blkif_max_segments in xen-blkfront.c to 128 (it is 32 by 
>>>> default), recompile the DomU kernel and see if that helps?
>>> 
>>> Ok, here we go.... compiled as 3.8.0-2 with the above change. 3.8.0-2 
>>> is running on both the Dom0 and DomU.
>>> 
>>> # dd if=/dev/zero of=output.zero bs=1M count=2048
>>> 2048+0 records in
>>> 2048+0 records out
>>> 2147483648 bytes (2.1 GB) copied, 22.1703 s, 96.9 MB/s
>>> 
>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>            0.34    0.00   17.10    0.00    0.23   82.33
>>> 
>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
>>> avgrq-sz avgqu-sz   await  svctm  %util
>>> sdd             980.97 11936.47   53.11  429.78     4.00    48.77 
>>> 223.81    12.75   26.10   2.11 101.79
>>> sdc             872.71 11957.87   45.98  435.67     3.55    49.30 
>>> 224.71    13.77   28.43   2.11 101.49
>>> sde             949.26 11981.88   51.30  429.33     3.91    48.90 
>>> 225.03    21.29   43.91   2.27 109.08
>>> sdf             915.52 11968.52   48.58  428.88     3.73    48.92 
>>> 225.84    21.44   44.68   2.27 108.56
>>> md2               0.00     0.00    0.00 1155.61     0.00    97.51 
>>> 172.80     0.00    0.00   0.00   0.00
>>> 
>>> # dd if=/dev/zero of=output.zero bs=1M count=2048 oflag=direct
>>> 2048+0 records in
>>> 2048+0 records out
>>> 2147483648 bytes (2.1 GB) copied, 25.3708 s, 84.6 MB/s
>>> 
>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>            0.11    0.00   13.92    0.00    0.22   85.75
>>> 
>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
>>> avgrq-sz avgqu-sz   await  svctm  %util
>>> sdd               0.00 13986.08    0.00  263.20     0.00    55.76 
>>> 433.87     0.43    1.63   1.07  28.27
>>> sdc             202.10 13741.55    6.52  256.57     0.81    54.77 
>>> 432.65     0.50    1.88   1.25  32.78
>>> sde              47.96 11437.57    1.55  261.77     0.19    45.79 
>>> 357.63     0.80    3.02   1.85  48.60
>>> sdf            2233.37 11756.13   71.93  191.38     8.99    46.80 
>>> 433.90     1.49    5.66   3.27  86.15
>>> md2               0.00     0.00    0.00  731.93     0.00    91.49 
>>> 256.00     0.00    0.00   0.00   0.00
>>> 
>>> Now this is pretty much exactly what I would expect the system to do.... 
>>> ~96MB/sec buffered, and 85MB/sec direct.
>> 
>> I'm sorry to be such a PITA, but could you also try with 64? If we have to 
>> increase the maximum number of indirect descriptors I would like to set it 
>> to the lowest value that provides good performance to prevent using too much 
>> memory.
>> 
>>> So - it turns out that xen_blkif_max_segments at 32 is a killer in the 
>>> DomU. Now it makes me wonder what we can do about this in kernels that 
>>> don't have your series of patches against it? And also about the 
>>> backend stuff in 3.8.x etc?
>> 
>> There isn't much we can do regarding kernels without indirect descriptors, 
>> there's no easy way to increase the number of segments in a request.
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel
>> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.