Xen project Mailing List

Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>

Date: Wed, 29 May 2019 08:21:12 -0700

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Mathieu Tarral <mathieu.tarral@xxxxxxxxxxxxxx>

Delivery-date: Wed, 29 May 2019 15:22:24 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

> > There are three views being used: the default (hostp2m); the > > execute-view which is active by default and has the remapped > > shadow-copies of the pages with breakpoints injected at the very end > > of the guests' physmap; and the read-only view that is only used when > > someone is trying to read the actual address of a shadow-copy at the > > end of the physmap (ie. not via the remapped gfn). > > Perhaps the terminology could be improved then, seeing as the main view > isn't really "execute restricted". It is only X-- for the few remapped > gfns, and XWR for the vast majority. Correct and most certainly the terminology is not great :) The altp2m_idx view actually starts out mainly empty with only the remapped pages being populated in it - the rest gets lazily copied from the hostp2m as the VM runs (and copies the default XWR permission over). > > The read-only view has all shadow-copy gfn's remapped to that one page > > full of 1's, because if you read random large gfn's in the guests' > > memory space that's what Xen's emulator returns. It is called > > zero_page because I originally guessed that those pages should be all > > 0 but it turned out I was wrong. Just haven't change the name of it > > yet. This page is there because we want to avoid someone being able to > > find out that there are shadow pages present. It would be quite > > obvious something is "odd" when you can find copies of the Windows > > kernel memory pages at the end of the memory space. So the shadow > > pages' real GFN mem_access is restricted in the execute view, which > > allows us to switch to the read-only view with MTF enabled and then > > back afterwards. That way the shadow pages are not visible to the > > guest, if someone tries to read them they return the same value and > > behave the same as all other unmapped gfn's in that memory region. > > Ahh ok. Yes - write-discard/read ~0 is a staple of "nothing present", > both in IO and MMIO space on x86. In which case a better name would be > sink_page or similar. Indeed. > Also, I see now that that is what Mathieu's code is doing (even though > this view isn't used at all, so far as I can tell), so consider the > question answered, and we're back to square 1 on the BSOD. > > As identified before, it needs to be only ever mapped read-only, because > sinking real writes into it would be a BadThing(tm). We do actually > have a p2m_type_ro which would hopefully cause emulated instructions to > DTRT as well, which should be faster than sending a write event all the > way to the VMI agent. Any write-violation to the sink page gets emulated with VM_EVENT_FLAG_EMULATE_NOWRITE. Speed isn't really an issue because this is an extreme corner case that never happens unless someone is doing something very peculiar, in which case it's good to be able to log it ;) > > > So since the read-only view with all the 1's is rarely used, let's > > talk about why patchguard can't notice changes to the kernel: > > Well... In this case it really is. We can certainly talk about why > patchguard *shouldn’t* notice :) > > > the execute-view has all pages that were breakpointed remapped and marked > > execute-only. When patchguard tries to read these pages, the view is > > swapped back to the hostp2m with MTF enabled. Then in the MTF callback > > the view is swapped back to the execute-view. This means that > > patchguard only ever reads the original page from the hostp2m. If the > > page is being written to, the same dance happens with the addition of > > the whole page being re-copied to the shadow location and the > > breakpoints being reapplied on the shadow copy. This copy happens > > while the whole domain is paused to avoid race-condition. > > > > I hope this makes sense. > > The dance with the read-only view doesn't happen in the simplified case, > but as both you and I have noticed, there looks to be issues with the > page permissions which are probably confounding the problems. > Correct, the read-only view is not something that would be used during normal execution of Windows. Mathieu's implementation is also incomplete as he is not applying the memory permissions. I double-checked in DRAKVUF and the memory permissions are definitely set so I really don't think its patchguard that catches modification to the kernel pages is what's behind the BSOD. If that was the case then simply letting it run for a while would trigger it. But that's not the case, I can run DRAKVUF for several hours with no BSOD. To me it seems to be the repeated stop/start that has something to do with it. Tamas _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.