[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] vmx/hap: optimize CR4 trapping



On 02/16/2018 01:25 PM, Roger Pau Monné wrote:
> On Thu, Feb 15, 2018 at 09:32:00PM +0200, Razvan Cojocaru wrote:
>> On 02/15/2018 08:57 PM, Andrew Cooper wrote:
>>> On 15/02/18 16:25, Roger Pau Monne wrote:
>>>> There a bunch of bits in CR4 that should be allowed to be set directly
>>>> by the guest without requiring Xen intervention, currently this is
>>>> already done by passing through guest writes into the CR4 used when
>>>> running in non-root mode, but taking an expensive vmexit in order to
>>>> do so.
>>>>
>>>> xenalyze reports the following when running a PV guest in shim mode:
>>>>
>>>>  CR_ACCESS             3885950  6.41s 17.04%  3957 cyc { 2361| 3378| 7920}
>>>>    cr4  3885940  6.41s 17.04%  3957 cyc { 2361| 3378| 7920}
>>>>    cr3        1  0.00s  0.00%  3480 cyc { 3480| 3480| 3480}
>>>>      *[  0]        1  0.00s  0.00%  3480 cyc { 3480| 3480| 3480}
>>>>    cr0        7  0.00s  0.00%  7112 cyc { 3248| 5960|17480}
>>>>    clts        2  0.00s  0.00%  4588 cyc { 3456| 5720| 5720}
>>>>
>>>> After this change this turns into:
>>>>
>>>>  CR_ACCESS                  12  0.00s  0.00%  9972 cyc { 3680|11024|24032}
>>>>    cr4        2  0.00s  0.00% 17528 cyc {11024|24032|24032}
>>>>    cr3        1  0.00s  0.00%  3680 cyc { 3680| 3680| 3680}
>>>>      *[  0]        1  0.00s  0.00%  3680 cyc { 3680| 3680| 3680}
>>>>    cr0        7  0.00s  0.00%  9209 cyc { 4184| 7848|17488}
>>>>    clts        2  0.00s  0.00%  8232 cyc { 5352|11112|11112}
>>>>
>>>> Note that this optimized trapping is currently only applied to guests
>>>> running with HAP on Intel hardware. If using shadow paging more CR4
>>>> bits need to be unconditionally trapped, which makes this approach
>>>> unlikely to yield any important performance improvements.
>>>>
>>>> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>>>> ---
>>>> Cc: Jun Nakajima <jun.nakajima@xxxxxxxxx>
>>>> Cc: Kevin Tian <kevin.tian@xxxxxxxxx>
>>>> Cc: Jan Beulich <jbeulich@xxxxxxxx>
>>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>> Cc: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>
>>>> Cc: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>
>>>> ---
>>>>  xen/arch/x86/hvm/vmx/vmx.c  | 41 +++++++++++++++++++++++++++++++++++++++++
>>>>  xen/arch/x86/hvm/vmx/vvmx.c |  5 ++++-
>>>>  xen/arch/x86/monitor.c      |  5 +++--
>>>>  3 files changed, 48 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>>> index d35cf55982..9747b2a398 100644
>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>>> @@ -1684,6 +1684,35 @@ static void vmx_update_guest_cr(struct vcpu *v, 
>>>> unsigned int cr)
>>>>          }
>>>>  
>>>>          __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
>>>> +
>>>> +        if ( (v->domain->arch.monitor.write_ctrlreg_enabled &
>>>> +              monitor_ctrlreg_bitmask(VM_EVENT_X86_CR4)) ||
>>>> +             !paging_mode_hap(v->domain) )
>>>> +            /*
>>>> +             * If requested by introspection or running in shadow mode 
>>>> trap all
>>>> +             * accesses to CR4.
>>>
>>> The monitor write_ctrlreg_onchangeonly feature was purposefully
>>> introduced to avoid sending PGE updates to the introspection agent.  It
>>> would be ideal to include that in the mask calculation so introspection
>>> cases don't vmexit for PGE changes.
>>>
>>> Also, AMD has similar capabilities, and (as of today) has gained CR
>>> monitoring.
>>
>> I believe the patch Andrew is referring to is this one:
>>
>> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=59aad28cfac09640e2272f1e87951406233c3192
>>
>> We added that specifically so that no PGE-only exits end up needing
>> (pointless) processing by the introspection agent.
> 
> I've been looking at that change and I think the logic is wrong, the
> following chunk:
> 
>      if ( (ad->monitor.write_ctrlreg_enabled & ctrlreg_bitmask) &&
>           (!(ad->monitor.write_ctrlreg_onchangeonly & ctrlreg_bitmask) ||
> -          value != old) )
> +          value != old) &&
> +         (!((value ^ old) & ad->monitor.write_ctrlreg_mask[index])) )
>      {
>          bool_t sync = !!(ad->monitor.write_ctrlreg_sync & ctrlreg_bitmask);
> 
> Seems wrong. Imagine for example the case where (value ^ old) ==
> PGE|PSE, and mask == PGE:
> 
> !((PGE|PSE) & PGE) will yield false, and the monitor won't be notified.
> 
> I think what you want is:
> 
> ((value ^ old) & ~ad->monitor.write_ctrlreg_mask[index])
> 
> But maybe I'm just confused.

No, I think you're quite right. Thanks for pointing that out!
We'll submit a fix patch shortly.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.