[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults



On Thu, Nov 22, 2018 at 08:24:52PM +0200, Razvan Cojocaru wrote:
> On 11/22/18 7:08 PM, Roger Pau Monné wrote:
> > On Thu, Nov 22, 2018 at 06:52:07PM +0200, Razvan Cojocaru wrote:
> >> On 11/22/18 5:37 PM, Roger Pau Monné wrote:
> >>> I don't think you are supposed to try to pause other vcpus while
> >>> holding a lock, as you can see it's quite likely that you will end up
> >>> deadlocking because the vCPU you are trying to pause is stuck waiting
> >>> on the lock that you are holding.
> >>>
> >>> You should figure out whether you can get into vmx_start_reexecute
> >>> without holding any locks, or alternatively drop the lock, pause the
> >>> vCPUs and pick the lock again.
> >>>
> >>> See for example how hap_track_dirty_vram releases the lock before
> >>> attempting to pause the domain for this same reason.
> >>
> >> Right, this will take more thinking.
> >>
> >> I've unlocked the p2m for testing and the initial hang is gone, however
> >> the same problem now applies to rexec_lock: nothing prevents two or more
> >> VCPUs from arriving in vmx_start_reexecute_instruction() simultaneously,
> >> at which point one of them might take the lock and try to pause the
> >> other, while the other is waiting to take the lock, with predictable
> >> results.
> >>
> >> On the other hand, releasing rexec_lock as well will allow two VCPUs to
> >> end up trying to pause each other (especially unpleasant in a 2 VCPU
> >> guest). At any given moment, there should be only one VCPU alive and
> >> trying to reexecute an instruction - and at least one VCPU alive on the
> >> guest.
> >>
> >> We'll get more coffee, and of course suggestions are appreciated (as has
> >> been all your help).
> > 
> > Hm, I don't think it's generally safe to try to pause domain vCPUs
> > from the same domain context, as you say it's likely to deadlock since
> > two vCPUs from the same domain might try to pause one another.
> > 
> > My knowledge of all this introspection logic is very vague, do you
> > really need to stop the other vCPUs while performing this reexecution?
> > 
> > What are you trying to prevent by pausing other vCPUs?
> 
> Yes, that's unfortunately very necessary.
> 
> The scenario is this: for introspection purposes, a bunch of pages are
> marked read-only in the EPT (or no-execute, but for the purposes of this
> example let's stick to read-only).
> 
> Now, we'll get vm_events whenever an instruction will try to write into
> one of those. Vm_events are expensive, so we _really_ want to get as few
> of those as possible while still keeping the guest protected. So we want
> to filter out irrelevant ones.
> 
> The main category of irrelevant ones are faults caused by walking the
> guest's page table. We only want events caused by an actual write into a
> protected page by an actual instruction running at RIP in the guest.
> 
> So, we don't want to get those vm_events where npfec.kind !=
> npfec_kind_with_gla in p2m_mem_access_check(), hence this patch:
> 
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=c5387c4d75602dbb2f0d3d961a5c4b8faf3873db
> 
> _However_, please picture an instruction that both writes into a page P1
> we're interested in, _and_ causes a write into a read-only page-walk
> related page P2. Emulating the current instruction, as the upstream
> patch does, does eliminate the vm_event caused by writing into P2, but
> with the unfortunate side-effect of losing a potentially critical event
> for the write into P1.

How could the event for P1 be lost? If the instruction writes to both
P1 and P2, you already got some kind of event since writing to P1
would trigger a fault. Then you can just discard the P2 part, forward
the P1 access and just emulate the instruction?

(I guess I'm missing something on the above)

> What this patch attempts to do is to mark P1 rwx (so allow the write),
> then put the faulting VCPU into singlestep mode, then restore the
> restrictions after it has finished single stepping. By now it's obvious
> why all the other VCPUs need to be paused: one of them might do a
> malicious write into P1 that silently succeeds (since the EPT is shared
> among all VCPUs - putting altp2m aside for a moment). We don't want that.

Can't you just change the p2m of a single vCPU? Either using altp2m or
some other mechanism.

Also keep in mind that this pause approach might work for guests with
a relatively small number of vCPUs, but I'm unsure this is going to
work for guests with high number of vCPUs, pausing all vCPUs for each
trapped instruction is likely going to stall the guest.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.