[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 2/4] xen/x86: Drop unnecessary barriers



On 16/08/17 17:47, Andrew Cooper wrote:
> On 16/08/17 16:23, Jan Beulich wrote:
>>>>> On 16.08.17 at 13:22, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> x86's current implementation of wmb() is a compiler barrier.  As a result, 
>>> the
>>> only change in this patch is to remove an mfence instruction from
>>> cpuidle_disable_deep_cstate().
>>>
>>> None of these barriers serve any purpose.  Most aren't aren't synchronising
>>> with any remote cpus, where as the mcetelem barriers are redundant with
>>> spin_unlock(), which already has full read/write barrier semantics.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> For the relevant parts
>> Acked-by: Jan Beulich <jbeulich@xxxxxxxx>
>> For the parts the ack doesn't extend to, however:
>>
>>> --- a/xen/arch/x86/mm/shadow/multi.c
>>> +++ b/xen/arch/x86/mm/shadow/multi.c
>>> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
>>>       * will make sure no inconsistent mapping being translated into
>>>       * shadow page table. */
>>>      version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
>>> -    rmb();
>>>      walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
>> Isn't this supposed to make sure version is being read first? I.e.
>> doesn't this at least need to be barrier()?
> atomic_read() is not free to be reordered by the compiler.  It is an asm
> volatile with a volatile memory reference.
>
>>> index a459e99..d5b6049 100644
>>> --- a/xen/drivers/passthrough/amd/iommu_init.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_init.c
>>> @@ -558,7 +558,6 @@ static void parse_event_log_entry(struct amd_iommu 
>>> *iommu, u32 entry[])
>>>              return;
>>>          }
>>>          udelay(1);
>>> -        rmb();
>>>          code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>>>                                        IOMMU_EVENT_CODE_SHIFT);
>>>      }
>>> @@ -663,7 +662,6 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 
>>> entry[])
>>>              return;
>>>          }
>>>          udelay(1);
>>> -        rmb();
>>>          code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>>>                                        IOMMU_PPR_LOG_CODE_SHIFT);
>>>      }
>> With these fully removed, what keeps the compiler from moving
>> the entry[1] reads out of the loop? Implementation details of
>> udelay() don't count...
> It is a write to the control variable which is derived from a non-local
> non-constant object.  It can't be hoisted at all.
>
> Consider this simplified version:
>
> while ( count == 0 )
>     count = entry[1];
>
> If entry were const, the compiler would be free to expect that the value
> doesn't change on repeated reads, but that is not the case here.

(And continuing my run of luck today), it turns out that GCC does
compile my example here to an infinite loop.

ffff82d08026025f:       84 c0                   test   %al,%al
ffff82d080260261:       75 0a                   jne    ffff82d08026026d 
<parse_ppr_log_entry+0x29>
ffff82d080260263:       8b 46 04                mov    0x4(%rsi),%eax
ffff82d080260266:       c1 e8 1c                shr    $0x1c,%eax
ffff82d080260269:       84 c0                   test   %al,%al
ffff82d08026026b:       74 fc                   je     ffff82d080260269 
<parse_ppr_log_entry+0x25>


I will move this to being a barrer() with a hoisting comment (to avoid
it looking like an SMP issue), and I'm going to have to re-evaluate how
sane I think the C standard to be.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.