[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] rwlock: allow recursive read locking when already locked in write mode



On 21.02.2020 16:13, Jürgen Groß wrote:
> On 21.02.20 15:51, Julien Grall wrote:
>>
>>
>> On 21/02/2020 14:35, Jürgen Groß wrote:
>>> On 21.02.20 15:32, Julien Grall wrote:
>>>>
>>>>
>>>> On 21/02/2020 14:16, Jürgen Groß wrote:
>>>>> On 21.02.20 15:11, Julien Grall wrote:
>>>>>> Hi Juergen,
>>>>>>
>>>>>> On 21/02/2020 14:06, Jürgen Groß wrote:
>>>>>>> On 21.02.20 14:49, Julien Grall wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 21/02/2020 13:46, Jürgen Groß wrote:
>>>>>>>>> On 21.02.20 14:36, Jan Beulich wrote:
>>>>>>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote:
>>>>>>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote:
>>>>>>>>>>>>> Allow a CPU already holding the lock in write mode to also 
>>>>>>>>>>>>> lock it in
>>>>>>>>>>>>> read mode. There's no harm in allowing read locking a rwlock 
>>>>>>>>>>>>> that's
>>>>>>>>>>>>> already owned by the caller (ie: CPU) in write mode. 
>>>>>>>>>>>>> Allowing such
>>>>>>>>>>>>> accesses is required at least for the CPU maps use-case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In order to do this reserve 14bits of the lock, this allows 
>>>>>>>>>>>>> to support
>>>>>>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: 
>>>>>>>>>>>>> one to
>>>>>>>>>>>>> signal there are pending writers waiting on the lock and the 
>>>>>>>>>>>>> other to
>>>>>>>>>>>>> signal the lock is owned in write mode. Note the write 
>>>>>>>>>>>>> related data
>>>>>>>>>>>>> is using 16bits, this is done in order to be able to clear 
>>>>>>>>>>>>> it (and
>>>>>>>>>>>>> thus release the lock) using a 16bit atomic write.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This reduces the maximum number of concurrent readers from 
>>>>>>>>>>>>> 16777216 to
>>>>>>>>>>>>> 65536, I think this should still be enough, or else the lock 
>>>>>>>>>>>>> field
>>>>>>>>>>>>> can be expanded from 32 to 64bits if all architectures 
>>>>>>>>>>>>> support atomic
>>>>>>>>>>>>> operations on 64bit integers.
>>>>>>>>>>>>
>>>>>>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit 
>>>>>>>>>>>> integers.
>>>>>>>>>>>>
>>>>>>>>>>>>>    static inline void _write_unlock(rwlock_t *lock)
>>>>>>>>>>>>>    {
>>>>>>>>>>>>> -    /*
>>>>>>>>>>>>> -     * If the writer field is atomic, it can be cleared 
>>>>>>>>>>>>> directly.
>>>>>>>>>>>>> -     * Otherwise, an atomic subtraction will be used to 
>>>>>>>>>>>>> clear it.
>>>>>>>>>>>>> -     */
>>>>>>>>>>>>> -    atomic_sub(_QW_LOCKED, &lock->cnts);
>>>>>>>>>>>>> +    /* Since the writer field is atomic, it can be cleared 
>>>>>>>>>>>>> directly. */
>>>>>>>>>>>>> +    ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts)));
>>>>>>>>>>>>> +    BUILD_BUG_ON(_QR_SHIFT != 16);
>>>>>>>>>>>>> +    write_atomic((uint16_t *)&lock->cnts, 0);
>>>>>>>>>>>>
>>>>>>>>>>>> I think this is an abuse to cast an atomic_t() directly into 
>>>>>>>>>>>> a uint16_t. You
>>>>>>>>>>>> would at least want to use &lock->cnts.counter here.
>>>>>>>>>>>
>>>>>>>>>>> Sure, I was wondering about this myself.
>>>>>>>>>>>
>>>>>>>>>>> Will wait for more comments, not sure whether this can be 
>>>>>>>>>>> fixed upon
>>>>>>>>>>> commit if there are no other issues.
>>>>>>>>>>
>>>>>>>>>> It's more than just adding another field specifier here. A cast 
>>>>>>>>>> like
>>>>>>>>>> this one is endianness-unsafe, and hence a trap waiting for a big
>>>>>>>>>> endian port attempt to fall into. At the very least this should 
>>>>>>>>>> cause
>>>>>>>>>> a build failure on big endian systems, even better would be if 
>>>>>>>>>> it was
>>>>>>>>>> endianness-safe.
>>>>>>>>>
>>>>>>>>> Wouldn't a union be the better choice?
>>>>>>>>
>>>>>>>> You would not be able to use atomic_t in that case as you can't 
>>>>>>>> assume the layout of the structure.
>>>>>>>
>>>>>>> union rwlockword {
>>>>>>>      atomic_t cnts;
>>>>>>>      uint32_t val;
>>>>>>>      struct {
>>>>>>>          uint16_t write;
>>>>>>>          uint16_t readers;
>>>>>>>      };
>>>>>>> };
>>>>>>>
>>>>>>> static inline const uint32_t _qr_bias(
>>>>>>>      const union rwlockword {
>>>>>>>          .write = 0,
>>>>>>>          .readers = 1
>>>>>>>      } x;
>>>>>>>
>>>>>>>      return x.val;
>>>>>>> }
>>>>>>>
>>>>>>> ...
>>>>>>>      cnts = atomic_add_return(_qr_bias(), &lock->cnts);
>>>>>>> ...
>>>>>>>
>>>>>>> I guess this should do the trick, no?
>>>>>>
>>>>>> You are assuming the atomic_t layout which I would rather no want 
>>>>>> to happen.
>>>>>
>>>>> That already happened. rwlock.h already assumes atomic_t to be 32 bits
>>>>> wide. I agree it would be better to have an atomic32_t type for this
>>>>> use case.
>>>>
>>>> I don't think you understood my point here. An atomic32_t will not be 
>>>> better as you still assume the layout of the structure. I.e the 
>>>> structure has only one field.
>>>
>>> I don't understand your reasoning here, sorry.
>>>
>>> Are you saying that a structure of two uint16_t isn't known to be 32
>>> bits long?
>>
>> You are assuming that atomic_t will always be:
>>
>> struct
>> {
>>    int counter;
>> }
>>
>> What if we decide to turn into
>>
>> struct
>> {
>>    bool a;
>>    int counter;
>> }
> 
> As said before: then queue_write_lock_slowpath() is already broken.

Why? The atomic_*() used would still only act on the counter field
(for their actual operation, i.e. what matters to callers; the
other field(s) could be statistics or whatever).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.