Xen project Mailing List

Re: [Xen-devel] xc_hvm_inject_trap() races

To: Jan Beulich <JBeulich@xxxxxxxx>, "rcojocaru@xxxxxxxxxxxxxxx" <rcojocaru@xxxxxxxxxxxxxxx>

From: Andrei Vlad LUTAS <vlutas@xxxxxxxxxxxxxxx>

Date: Wed, 2 Nov 2016 09:13:01 +0000

Accept-language: en-US

Cc: "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "tamas@xxxxxxxxxxxxx" <tamas@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 02 Nov 2016 09:13:06 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHSNB9fAQsU0GFW/EGObuaHm+fPzKDDzBiAgAAGlACAAFPJgIAAIuIw///qIACAAH9SUIAAj5MAgAAiqKA=

Thread-topic: RE: [Xen-devel] xc_hvm_inject_trap() races

> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Sent: 2 November, 2016 10:50 > To: rcojocaru@xxxxxxxxxxxxxxx; Andrei Vlad LUTAS > <vlutas@xxxxxxxxxxxxxxx> > Cc: andrew.cooper3@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx; > tamas@xxxxxxxxxxxxx > Subject: RE: RE: [Xen-devel] xc_hvm_inject_trap() races > > >>> On 01.11.16 at 23:17, <vlutas@xxxxxxxxxxxxxxx> wrote: > > From: Jan Beulich [mailto:jbeulich@xxxxxxxx] > > Sent: 1 November, 2016 18:40 > >>>> Andrei Vlad LUTAS <vlutas@xxxxxxxxxxxxxxx> 11/01/16 5:13 PM >>> > >>>First of all, to answer your original question: the injection > >>>decision is made when the introspection logic needs to inspect a page > >>>that is not present in the physical memory. We don't really care if > >>>the current instruction triggers multiple faults or not (and here I'm > >>>not sure what you mean by that - multiple exceptions, or multiple EPT > >>>violations - but the answer is still the same), and removing the page > >>>restrictions after the #PF injection is introspection specific logic > >>>- the address for which we inject the #PF doesn't have to be related > >>>in any way to the > > current instruction. > > > >>Ah, that's this no-architectural behavior again. > > > > I don't think the HVI #PF injection internals or how the #PF is > > handled by the OS are relevant here. We are using an existing API that > > seems to not work quite correct under certain circumstances and we > > were curious if any of you can shed some light in this regard, and > > maybe point us to the right direction for cooking up a fix. > > > >>What if the OS doesn't fully carry out the page-in, relying on the #PF > >>to > > retrigger once the insn for which it got reported has been restarted? > > > > Can you be more specific? > > Well, perhaps with the answer you gave further down that's not that > relevant anymore, but consider a #PF handler which handles just the top > most not-present page table level each time it gets invoked. I.e. > for a not-present L4 entry it would take 4 re-invocations of the same original > instruction to resolve all 4 levels. I see what you're referring to. As I explained to Andrew in a previous mail - the #PF injection logic is indeed OS specific, and in our particular case (since VM introspection already has to handle a lot of OS specific stuff), we don't have to deal with such a behavior on the supported operating systems. Anyway, the example you provided would involve significant added performance penalty and I don't see why an OS would do that (nor have I heard of any doing it), but I understand your concern. > > >> Or what if the page gets paged out again before the insn actually > >> gets to > > execute (e.g. because a re-schedule happened inside the guest on the > > way out of the #PF handler)? All of this suggests that you really > > can't lift >any restrictions _before_ seeing what you need to see. > > > > We don't really care when and how the #PF is handled. We don't care if > > the page is paged out at some random point. What we do know is that at > > a certain point in the future, the page will be swapped in; how do we > > know when? The OS will write the guest page tables, at which point we > > can inspect the physical page itself (so you can see here why we don't > > care about the page being swapped out sometime in the future). So we > > really _can_ lift any restriction we want at that point. > > Hmm, I'm having difficulty seeing the supposedly broken flow of events > here: Earlier it was said that #PF injection would be a result of EPT event > processing. Here you say that the lifting of the restrictions would be a > result > of seeing the guest modify its page tables (which would in turn be a result of > the #PF actually having arrived in the guest). So if (with this, and as you > say > above) you don't care when the #PF gets handled, where's the original > problem? That's not what I wanted to say, sorry if it was unclear. What I'm trying to say is that the decision to inject a #PF can be made when handling an EPT violation - the accessed page needs not be related in any way with the page for which we decide to inject the #PF. For example, we intercept writes in a list that describes the loaded module. Whenever a new module is loaded, an entry would be inserted into that list, and that would generate an EPT write violation. Now, the introspection logic will be able to analyze what module was loaded and where, and it may find out that the module headers (which are needed by the protection logic) are not present in memory - therefore, it would inject a #PF in order to force the OS to swap in said headers. On the other hand, the HVI logic may also decide that it doesn't need to watch for modules loading anymore (for example, all the interesting modules were loaded), so it will remove the write hook from the list of loaded modules. These two (injection of the #PF and the removal of the EPT write protection) would be done in the same event handler, so we can't rely on the event being re-generated in this case. Hopefully this example makes it more clear. > > >>>Assuming that we wouldn't remove the restrictions and we would rely > >>>on re-generating the event - that is not acceptable: first of all > >>>because the instruction would normally be emulated anyway before > >>>re-entering the guest, > > > >>How would that be a problem? > > > > I thought it was obvious without further clarification: how can we > > expect the exact same event to be generated, if the instruction that > > triggered it in the first place was emulated or single stepped? > > Neither emulation nor single stepping should result in architectural events > (exceptions) to be missed (or else there's a bug somewhere). > Non-architectural #PF like you're using of course can't (currently) be > guaranteed to arrive at any particular point in time. > > The fact that {vmx,svm}_inject_trap() combine the new exception with an > already injected one (and blindly discard events other than hw exceptions), > otoh, looks like indeed wants to be controllable by the caller: When the > event comes from the outside (the hypercall), it would clearly seem better to > simply tell the caller that no injection happened and the event needs to be > kept pending. The main question then is how to make certain injection gets > retried at the right point in time (read: once the other interrupt handler > IRETs > back to original context). Yes, this is basically our problem. Right now, the #PF would overwrite other interrupts, which is very bad. On the other hand, it can't return an error (if I understand the code correctly), since it can't know if another event will be scheduled for injection. As I told Andrew, at least returning an error that would indicate the #PF cannot be injected may help us a lot here (I'm sure making the injected trap take precedence over other events would not be acceptable). > > Jan > > ________________________ > This email was scanned by Bitdefender Best regards, Andrei. ________________________ This email was scanned by Bitdefender _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.