[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xc_hvm_inject_trap() races



>>> On 07.11.16 at 12:49, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
> On 11/07/2016 11:57 AM, Jan Beulich wrote:
>>>>> On 07.11.16 at 10:37, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>> The problem there is that vmx_idtv_reinject() is a corner case: it
>>> writes VM_ENTRY_INTR_INFO directly, and this can happen _before_ the
>>> guest exits with, let's say, an EPT vm_event.
>>>
>>> In that case, the guest has exited, there's already an interrupt
>>> pending, and while the VCPU is paused waiting for the vm_event reply we
>>> request an injection that effectively obliterates the pending interrupt.
>>>
>>> Going the singlestep way is satisfactory for us for most cases, but it
>>> still leaves the described corner case. The only fix proposal we could
>>> think of is to, instead of vmx_idtv_reinject() doing the work, simply
>>> set some flags, and have a later function do the actual work, in
>>> vmx_intr_assist() style.
>>>
>>> I hope I've been able to make this clearer (and I haven't misunderstood
>>> something in the process :) ).
>> 
>> You did, thanks. Yet I continue to remain on my earlier position: It's
>> the non-architectural (injected) event which should get deferred,
>> not architectural ones (either the variant you describe above, or
>> any which hypervisor processing found necessary to deliver). The
>> single step case can be viewed as an exception here, albeit ideally
>> it also wouldn't need to be special cased.
> 
> But the problem is that if we defer, say a #PF, by the time there's a VM
> entry with no pending interrupt we may get the wrong context (wrong
> address space, guest mode, etc.).
> 
> The simple solution to this would be to save the context (for a #PF this
> would additionally mean CR3 as well), and only inject at the first
> opportunity where the context matches.
> 
> However, this gets ugly quickly: for one, the right context for a #PF
> may occur on a different VCPU, so right off the bat we can't hold this
> information per-VCPU anymore (it would need to be per-domain).
> 
> Second, it is conceivable that there will be several injection requests
> not yet delivered, and in that case we need to save the context for all
> of them (so an array or some sort of container of contexts becomes
> necessary).
> 
> And third, there's no guarantee that the guest will ever reach the
> proper context for injecting any of the deferred interrupts for the
> duration of it's life, which brings us back to the problem we were
> trying to solve: we still can't guarantee trap delivery, but now in a
> much more complicated manner. :)
> 
> We haven't even discussed what to do if a new request comes when the
> context buffer is full (we'd need to lose an undelivered trap), or that
> different trap types may require different contexts and handling logic.

All agreed, yet all are issues for you to solve in order to be able
to half way sanely inject a non-architectural fault. (And btw., CR3
as context may not suffice - you may also need e.g. the altp2m
index in use.)

> At this point it looks like the best solution is to simply discard the
> non-architectural event if there's a pending architectural one, and add
> a new vm_event, as suggested by Tamas, that notifies the application
> that a trap has failed, or succeeded, and let it do the best it can with
> that information.

Well, if the two of you think this is something which fits into the VM
event model, then I guess that's an option. I, for one, am not
convinced: It simply seems too special purpose. If this was a more
generic event ("interruption delivered", perhaps with a type or
vector qualifier) that can be subscribed to, and the app did that
only for such transient periods of time, this would look more
reasonable to me.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.