[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] grant_table: convert grant table rwlock to percpu rwlock



>>> On 17.11.15 at 18:53, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 17/11/15 17:39, Jan Beulich wrote:
>>>>> On 17.11.15 at 18:30, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 17/11/15 17:04, Jan Beulich wrote:
>>>>>>> On 03.11.15 at 18:58, <malcolm.crossley@xxxxxxxxxx> wrote:
>>>>> --- a/xen/common/grant_table.c
>>>>> +++ b/xen/common/grant_table.c
>>>>> @@ -178,6 +178,10 @@ struct active_grant_entry {
>>>>>  #define _active_entry(t, e) \
>>>>>      ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
>>>>>  
>>>>> +bool_t grant_rwlock_barrier;
>>>>> +
>>>>> +DEFINE_PER_CPU(rwlock_t *, grant_rwlock);
>>>> Shouldn't these be per grant table? And wouldn't doing so eliminate
>>>> the main limitation of the per-CPU rwlocks?
>>> The grant rwlock is per grant table.
>> That's understood, but I don't see why the above items aren't, too.
> 
> Ah - because there is never any circumstance where two grant tables are
> locked on the same pcpu.

double_gt_lock()? Those are write locks, and iiuc the limitation is just
on read locks, but anway.

> Nor is there any such need.

I'm sorry, but no. If the lock flavor is to be tailored to gnttab use,
then it shouldn't be added outside of grant_table.c.

>>> The entire point of this series is to reduce the cmpxchg storm which
>>> happens when many pcpus attempt to grap the same domains grant read lock.
>>>
>>> As identified in the commit message, reducing the cmpxchg pressure on
>>> the cache coherency fabric increases intra-vm network through from
>>> 10Gbps to 50Gbps when running iperf between two 16-vcpu guests.
>>>
>>> Or in other words, 80% of cpu time is wasted with waiting on an atomic
>>> read/modify/write operation against a remote hot cache line.
>> All of this is pretty nice, but again unrelated to the question I
>> raised.
>>
>> The whole interface would likely become quite a bit easier to use
>> if there was a percpu_rwlock_t comprising all three elements (the
>> per-CPU item obviously would need to become a per-CPU pointer,
>> with allocation of per-CPU data needing introduction).
> 
> Runtime per-CPU data allocation is incompatible with our current scheme
> (which relies on the linker to do some of the heavy lifting).

Well, there are ways to deal with that. One would be for components
to declare how much they might need to use in the worst case (along
the lines of x86 Linux'es .brk mechanism). Another (a variant thereof,
much easier to implement right away) would be for grant_table.c to
simply create a per-CPU array with DOMID_FIRST_RESERVED entries,
using the one corresponding to the domain in question. And of course
a scheme not as advanced as current Linux'es might do too - after all
it did for Linux for many years (before the current one got invented).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.