Xen project Mailing List

Re: [Xen-devel] Question about VPID during MOV-TO-CR3

To: "Tamas K Lengyel" <tamas.lengyel@xxxxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Tue, 27 Sep 2016 07:49:51 -0600

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>

Delivery-date: Tue, 27 Sep 2016 13:50:13 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 26.09.16 at 18:12, <tamas.lengyel@xxxxxxxxxxxx> wrote: > On Mon, Sep 26, 2016 at 12:24 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote: >>>>> On 23.09.16 at 22:45, <tamas.lengyel@xxxxxxxxxxxx> wrote: >>> On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel >>> <tamas.lengyel@xxxxxxxxxxxx> wrote: >>>> On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote: >>>>>>>> On 23.09.16 at 17:26, <tamas.lengyel@xxxxxxxxxxxx> wrote: >>>>>> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote: >>>>>>>>>> On 22.09.16 at 19:18, <tamas.lengyel@xxxxxxxxxxxx> wrote: >>>>>>>> So I verified that when CPU-based load exiting is enabled, the TLB >>>>>>>> flush here is critical. Without it the guest kernel crashes at random >>>>>>>> points during boot. OTOH why does Xen trap every guest CR3 update >>>>>>>> unconditionally? While we have features such as the vm_event/monitor >>>>>>>> that may choose to subscribe to that event, Xen traps it even when >>>>>>>> that is not in use. Is that trapping necessary for something else? >>>>>>> >>>>>>> Where do you see this being unconditional? construct_vmcs() >>>>>>> clearly avoids setting these intercepts when using EPT. Are you >>>>>>> perhaps suffering from >>>>>>> >>>>>>> /* Trap CR3 updates if CR3 memory events are enabled. */ >>>>>>> if ( v->domain->arch.monitor.write_ctrlreg_enabled & >>>>>>> monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) >>>>>>> v->arch.hvm_vmx.exec_control |= >>>>>>> CPU_BASED_CR3_LOAD_EXITING; >>>>>>> >>>>>>> in vmx_update_guest_cr()? That'll be rather something for you >>>>>>> or Razvan to explain. Outside of nested VMX I don't see any >>>>>>> other enabling of that intercept (didn't check AMD code on the >>>>>>> assumption that you're working on Intel hardware). >>>>>> >>>>>> So there seems to be two separate paths that lead to the TLB flushing. >>>>>> One is indeed the above case you cited when we enable CR3 monitoring >>>>>> through the monitor interface. However, during domain boot I also see >>>>>> this path being called that is not related to the >>>>>> CPU_BASED_CR3_LOAD_EXITING: >>>>>> >>>>>> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3 >>>>>> (XEN) hap.c:701:d1v0 HAP update cr3 called >>>>>> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 >>> called >>>>>> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000 >>>>>> >>>>>> This path seems to de-activate once the domain is fully booted. >>>>> >>>>> This late? According to the CR0 handling in >>>>> vmx_update_guest_cr() I would understand it to be enabled only >>>>> while the guest is still in real mode (and even then only on old >>>>> hardware, i.e. without the Unrestricted Guest functionality). >>>>> >>>> >>>> Right, with unrestricted guest support I would assume none of this >>>> would get called - but it does, and quite frequently during domain >>>> boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430. >>>> >>> >>> So I experimented with selectively disabling the flushing such that >>> it's done only when coming from a path other then CPU-based CR3 load >>> exiting. I've added a bool to struct vcpu that gets set to 0 every >>> time vmx_vmexit_handler is called, and only gets set to 1 when >>> vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr >>> the flush only happens as such: >>> >>> if ( !v->movtocr3 ) >>> hvm_asid_flush_vcpu(v); >>> >>> In the guest I run a test application that allocates a page at a fixed >>> VA, writes a magic value to it, and then keeps spinning on reading the >>> magic value back from the page, checking if it's the same as >>> originally supplied. I lunch this application twice with different >>> magic values, so that if the TLB invalidation is an issue one of the >>> test applications would read back the wrong magic value from the VA >>> using a stale TLB entry. I've verified that same VA in the two >>> applications point to different pages and that those PTEs are not >>> marked global and no PCID is used. >>> >>> [ 724] test (struct addr:ffff88003730f330). PGD: 0x3731f000 >>> VADDR 0x5000000 -> PADDR 0x73e35000. Global page: 0 >>> [ 727] test (struct addr:ffff88003681ea20). PGD: 0x777a6000 >>> VADDR 0x5000000 -> PADDR 0x75043000. Global page: 0 >> >> I'm surprised. As said before - a mov-to-CR3 cannot be emulated >> without a minimal amount of flushing. No experiments whatsoever >> are suitable to prove the contrary. > > That's a pretty strong statement - can you tell me where in the SDM > does it say that exactly? I've went through it couple times already > and I can't find anything that explicitly says that the flushing has > to be performed by the VMM when mov-to-CR3 trapping is enabled. I though I had pointed you there already: Section "Instructions that cause VM exits". There's nothing said about flushes, but that's also not necessary: "... the instruction causing the VM exit does not execute and no processor state is updated by the instruction." Plus everything the sub-section "Relative Priority of Faults and VM Exits" says. > The > closest thing I found was indicating the contrary. Furthermore, if the > flushing is necessary, then how would you explain that there were no > TLB mixups in the above experiment? No idea. Perhaps there is some further flushing going on due to other reasons? >>> Both applications work as expected without the VPID flushing taking >>> place. So at least for CPU-based CR3 load exiting it seems that this >>> flush is not necessary. As for why this path gets called during domain >>> boot when the CPU supports Unrestricted Guest mode and it is properly >>> detecting when Xen boots, I'm not sure. However, as we use CPU-based >>> CR3 load exiting quite often when doing VMI, I would prefer to disable >>> this flushing at least for this case. Any thoughts? >> >> As said before - you'd better direct this question to the VMX >> maintainers, and even better would be to first understand why >> the intercept remains enabled in the first place. After all it's >> quite obvious that most improvement can be expected from not >> enabling it at all, whenever possible. Only if it needs to stay >> enabled over extended periods of a guest's lifetime it would then >> become interesting to see whether the emulation path can be >> improved. >> > > To clarify - mov-to-CR3 trapping is _not_ enabled by default on a > domain. I assumed it is the only path to vmx_update_guest_cr, but I > now further verified that vmx_cr_access does not get called for a > mov-to-CR3 when the domain boots, it only gets called when we enable > it through the monitor system. There is another path leads to a call > to vmx_update_guest_cr for updating CR3 when the domain boots which > seems to require this flushing to happen. That other path I don't care > about - although it's rather odd in itself as well. Now when the > mov-to-CR3 path gets activated the flushing does not seem to be > necessary as my experiment shows and it actually actively breaks > architectural features (global pages and PCID). Once again - it does not break anything. Performance aspects are not architectural features. All you can say is that it makes these extended features useless. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.