[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 for-4.14] x86/monitor: revert default behavior when monitoring register write events



On Mon, Jun 8, 2020 at 3:16 PM Razvan Cojocaru
<rcojocaru@xxxxxxxxxxxxxxx> wrote:
>
> On 6/8/20 11:44 PM, Tamas K Lengyel wrote:
> > On Mon, Jun 8, 2020 at 2:14 PM Razvan Cojocaru
> > <rcojocaru@xxxxxxxxxxxxxxx> wrote:
> >>
> >> On 6/8/20 10:54 PM, Tamas K Lengyel wrote:
> >>> On Mon, Jun 8, 2020 at 12:58 PM Razvan Cojocaru
> >>>> And last but not least, the proper sequence is: 1. unsubscribe from
> >>>> register write events, 2. process all events "still in the chamber"
> >>>> (keep checking the ring buffer for a while), 3. detach from the guest
> >>>> (disable the vm_event subsystem). Not ideal perhaps (in that it's not
> >>>> guaranteed that a VCPU won't resume after a longer period than our
> >>>> timeout), but if the sequence is followed there should be no guest hangs
> >>>> or crashes (at least none that we or our clients have observed so far).
> >>>
> >>> Incorrect. That's not enough. You also have to wait for all the vCPUs
> >>> to get scheduled before disabling vm_event or otherwise the emulation
> >>> is skipped entirely. Please read the patch message. If the user
> >>> decides to disable vm_event after getting a CR3 event delivered, the
> >>> CR3 never gets updated and results in the guest crashing in
> >>> unpredictable ways. Same happens with all the other registers.
> >>
> >> I did read the patch message. As I've previously stated ("it's not
> >> guaranteed that a VCPU won't resume after a longer period than our
> >> timeout"), it's not ideal, and I've made no claim that the problem does
> >> not exist or that it shouldn't be fixed - but really if you've got a
> >> VCPU that will wait more than a couple of seconds to get scheduled then
> >> you've got a bigger problem with the VM.
> >
> > Sorry, missed the part where you say you knew about this issue. We
> > didn't and it was absolutely not documented anywhere and certainly not
> > mentioned in the original patch that added the feature. It literally
> > took us years to figure out what the hell is going on. While as you it
> > can be a couple seconds before its safe to disable it can be a lot
> > longer depending on the system state, how many VMs are running and how
> > many vCPUs are active on the VM. There is absolutely necessary
> > use-cases where you want to keep the VM paused after a CR3 event is
> > received and exit vm_event afterwards. This bug totally blocks those
> > use-cases. Not to mention that it's a total mess having an interface
> > where the user has to guess when its safe to do something. If this was
> > pointed out when the patch was made I would have objected to that
> > being merged.
>
> No, we didn't know about the issue. It's apparently not my most eloquent
> evening. I was merely saying that I did understand what the issue was
> from your description, and offering an explanation on why we've never
> seen this in QA or production. Of course the issue would have been
> mentioned at least (but ideally not exist to begin with) had it been known.
>
> We do take several passes through the ring buffer (and as a side-effect
> reduce the probability of a race occuring to almost zero), but we do
> that to make sure we've cleaned up all EPT vm_events; the fact that it
> has helped with _this_ issue is a coincidence.
>
> IIRC Windows, at least, will be upset if a VCPU is stuck for too long.
>
> As for how the vm_event system behaves:
>
> 1. A security application that is unable to _prevent_ things being done
> to a system is not doing a very good job, since in that case you can
> only collect stats and not veto anything. I would argue that the default
> for such a monitoring application should be the current one (all events
> should be pre-action).

Not all security applications require this though. Malware analysis
where stealth is required would absolutely not want this side-effect
to be visible to the guest where malware could use it to determine
that it's being monitored. So I don't buy into this argument.

>
> 2. This is further supported by the fact that that's exactly how the EPT
> vm_events work: you get a "I want to write to this page" event _before_
> the write occurs. If register writes behave differently, you have _two_
> different models: one where you get an event post-action, and one where
> you get one pre-action.

Whether you get an event before or after the effects of the event have
been applied to the system state shouldn't matter as long as you can
revert that action. I wouldn't care either way to be the default. But
having a default that breaks other use-cases is unacceptable.

Tamas



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.