[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] x86: do not enable global pages when virtualized on AMD hardware



On Mon, Dec 09, 2019 at 04:39:58PM +0100, Jan Beulich wrote:
> On 09.12.2019 16:36, Roger Pau Monné wrote:
> > On Mon, Dec 09, 2019 at 04:04:51PM +0100, Jan Beulich wrote:
> >> On 09.12.2019 15:46, Roger Pau Monné wrote:
> >>> On Mon, Dec 09, 2019 at 03:21:28PM +0100, Jan Beulich wrote:
> >>>> On 09.12.2019 11:20, Roger Pau Monné wrote:
> >>>>> On Wed, Dec 04, 2019 at 06:05:11PM +0100, Jan Beulich wrote:
> >>>>>> On 04.12.2019 17:18, Roger Pau Monné wrote:
> >>>>>>> On Wed, Dec 04, 2019 at 05:11:42PM +0100, Jan Beulich wrote:
> >>>>>>>> On 04.12.2019 16:12, Roger Pau Monne wrote:
> >>>>>>>>> @@ -130,7 +143,7 @@ unsigned long pv_make_cr4(const struct vcpu *v)
> >>>>>>>>>       */
> >>>>>>>>>      if ( d->arch.pv.pcid )
> >>>>>>>>>          cr4 |= X86_CR4_PCIDE;
> >>>>>>>>> -    else if ( !d->arch.pv.xpti )
> >>>>>>>>> +    else if ( !d->arch.pv.xpti && opt_global_pages )
> >>>>>>>>>          cr4 |= X86_CR4_PGE;
> >>>>>>>>
> >>>>>>>> I'm sorry for noticing this only now, but what about XEN_MINIMAL_CR4,
> >>>>>>>> which includes X86_CR4_PGE?
> >>>>>>>
> >>>>>>> I've tried removing PGE from XEN_MINIMAL_CR4 but it made no noticeable
> >>>>>>> performance difference, so I left it as-is.
> >>>>>>
> >>>>>> My concern isn't about performance, but correctness. I admit I
> >>>>>> forgot for a moment that we now always write CR4 (unless the
> >>>>>> cached value matches the intended new one). Yet
> >>>>>> mmu_cr4_features (starting out as XEN_MINIMAL_CR4) is still of
> >>>>>> concern.
> >>>>>>
> >>>>>> I think this at least requires extending the description to
> >>>>>> discuss the correctness.
> >>>>>
> >>>>> Would you be fine with adding the following at the end of the commit
> >>>>> message.
> >>>>>
> >>>>> "Note that XEN_MINIMAL_CR4 is not modified, and thus global pages are
> >>>>> left enabled for the hypervisor. This is not an issue because the code
> >>>>> to switch the control registers (cr3 and cr4) already takes into
> >>>>> account such situation and performs the necessary flushes. The same
> >>>>> already happens when using XPTI or PCIDE, as the guest cr4 doesn't
> >>>>> have global pages enabled in that case either."
> >>>>
> >>>> Yes, this is good for XEN_MINIMAL_CR4. But I think mmu_cr4_features
> >>>> needs discussing (or at least mentioning, if the same arguments
> >>>> apply) as well.
> >>>
> >>> The same applies to mmu_cr4_features, it's fine for the hypervisor to
> >>> use a different set of cr4 features (especially PGE) than PV guests:
> >>> this is already the case when using XPTI or PCIDE when Xen cr4 will
> >>> have PGE and guests cr4 won't, and switch_cr3_cr4 DTRT.
> >>>
> >>> So I would s/XEN_MINIMAL_CR4/XEN_MINIMAL_CR4 and mmu_cr4_features/ in
> >>> the above proposed paragraph.
> >>
> >> I'm afraid it's more complicated, up to and including you making a
> >> possible pre-existing bug worse: The S3 resume path loads CR4 from
> >> mmu_cr4_features, but doesn't update the in-memory cache of the
> >> register.
> > 
> > All domains are paused and the scheduler is disabled when doing S3
> > suspend/resume, and the actual suspend and resume code is run by a
> > tasklet which is executed in the idle vCPU context (since all domains
> > are paused), and hence the write of CR4 with mmu_cr4_features is fine
> > as entering guest context after resume is going to involve a call to
> > switch_cr3_cr4 in order to switch out of the idle vCPU.
> 
> And switch_cr3_cr4() has
> 
>     if ( old_cr4 != cr4 )
>         write_cr4(cr4);
> 
> with old_cr4 read from the cache.

That read from the cache is fine. The idle vCPU cr4 is
mmu_cr4_features (see write_ptbase), and hence the write done in
__ret_point should match what's in the cache.

> > It might be clearer to just save cr4 in do_suspend_lowlevel like it's
> > done with the rest of the control registers.
> 
> Not just more clear, but also more reliable.

Let me prepare a patch to improve this, but I think the current patch
at hand is correct and shouldn't be blocked by this suspend/resume
improvement.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.