[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [for-4.7] x86/emulate: synchronize LOCKed instruction emulation



>>> Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> 05/05/16 11:24 AM >>>
>On 05/04/2016 04:42 PM, Jan Beulich wrote:
>>>>> On 04.05.16 at 13:32, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>> But while implementing a stub that falls back to the actual LOCK CMPXCHG
>>> and replacing hvm_copy_to_guest_virt() with it would indeed be an
>>> improvement (with the added advantage of being able to treat
>>> non-emulated LOCK CMPXCHG cases), I don't understand how that would
>>> solve the read-modify-write atomicity problem.
>>>
>>> AFAICT, this would only solve the write problem. Assuming we have VCPU1
>>> and VCPU2 emulating a LOCKed instruction expecting rmw atomicity, the
>>> stub alone would not prevent this:
>>>
>>> VCPU1: read, modify
>>> VCPU2: read, modify, write
>>> VCPU1: write
>> 
>> I'm not sure I follow what you mean here: Does the above represent
>> what the guest does, or what the hypervisor does as steps to emulate
>> a _single_ guest instruction? In the former case, I don't see what
>> you're after. And in the latter case I don't understand why you think
>> using CMPXCHG instead of WRITE wouldn't help.
>
>Briefly, this is the scenario: assuming a guest with two VCPUs and an
>introspection application that has restricted access to a page, the
>guest runs two LOCK instructions that touch the page, causing a page
>fault for each instruction. This further translates to two EPT fault
>vm_events being placed in the ring buffer.
>
>By the time the introspection application polls the event channel, both
>VCPUs are paused, waiting for replies to the vm_events.
>
>If the monitoring application processes both events (puts both replies,
>with the emulate option on, in the ring buffer), then signals the event
>channel, it is possible that both VCPUs get woken up, ending up running
>x86_emulate() simultaneously.
>
>In this case, my understanding is that just using CMPXCHG will not work
>(although it is clearly superior to the current implementation), because
>the read part and the write part of x86_emulate() (when LOCKed
>instructions are involved) should be executed atomically, but writing
>the CMPXCHG stub would only make sure that two simultaneous writes won't
>occur.
>
>In other words, this would still be possible (atomicity would still not
>be guaranteed for LOCKed instructions):
>
>VCPU1: read
>VCPU2: read, write
>VCPU1: write
>
>when what we want for LOCKed instructions is:
>
>VCPU1: read, write
>VCPU2: read, write

Okay, in short I take this to mean "single instruction" as answer to my actual
question.

>Am I misunderstanding how x86_emulate() works?

No, but I suppose you're misunderstanding what I'm trying to suggest. What you
write above is not what will result when using CMPXCHG. Instead what we'll have
is

vCPU1: read
vCPU2: read
vCPU2: cmpxchg
vCPU1: cmpxchg

Note that the second cmpxchg will fail unless the first one wrote back an
unchanged value. Hence vCPU1 will be told to re-execute the instruction.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.