[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/PV: conditionally avoid raising #GP for early guest MSR accesses



On 03.11.2020 18:31, Andrew Cooper wrote:
> On 03/11/2020 17:06, Jan Beulich wrote:
>> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
>> handler early enough to cover for example the rdmsrl_safe() of
>> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
>> of MSR_K7_HWCR later in the same function). The respective change
>> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
>> backported to 4.14, but no further - presumably since it wasn't really
>> easy because of other dependencies.
>>
>> Therefore, to prevent our change in the handling of guest MSR accesses
>> to render PV Linux 4.13 and older unusable on at least AMD systems, make
>> the raising of #GP on these paths conditional upon the guest having
>> installed a handler. Producing zero for reads and discarding writes
>> isn't necessarily correct and may trip code trying to detect presence of
>> MSRs early, but since such detection logic won't work without a #GP
>> handler anyway, this ought to be a fair workaround.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> 
> I appreciate that we probably have to do something, but I don't think
> this is a wise move.

I wouldn't call it wise either, but I'm afraid something along
these lines is necessary.

> Linux is fundamentally buggy.  It is deliberately looking for a
> potential #GP fault given its use of rdmsrl_safe().  The reason this bug
> stayed hidden for so long was as a consequence of Xen's inappropriate
> MSR handling for guests, and the reasons for changing Xen's behaviour
> still stand.

I agree.

> This change, in particular, does not apply to any explicitly handled
> MSRs, and therefore is not a comprehensive fix.

But it's intentional that this deals with the situation in a
generic way, not on a per-MSR basis. If we did as you suggest
further down, we'd have to audit all Linux versions up to 4.14
for similar issues with other MSRs. I don't think this would
be a practical thing to do, and I also don't think that leaving
things as they are until we have concrete reports of problems
is a viable option either.

Adding explicit handling for the two offending MSRs (and any
possible further ones we discover) would imo only be to avoid
issuing the respective log messages.

>  Nor is it robust to
> someone adding code to explicitly handling the impacted MSRs at a later
> date (which are are likely to need to do for HWCR), and which would
> reintroduce this failure to boot.

I'm afraid I don't understand. Looking at the two functions
the patch alters, only X86EMUL_OKAY is used in return statements
other than the final one. If this model is to be followed by
future additions (which I think it ought to be; perhaps we
should add comments to this effect), the code introduced here
will take care of the situation nevertheless.

> We should have the impacted MSRs handled explicitly, with a note stating
> this was a bug in Linux 4.14 and older.  We already have workaround for
> similar bugs in Windows, and it also gives us a timeline to eventually
> removing support for obsolete workarounds, rather than having a "now and
> in the future, we'll explicitly tolerate broken PV behaviour for one bug
> back in ancient linux".

Comparing with Windows isn't very helpful; the patch here is
specifically about PV, and would help other OSes as well in
case they would have missed setting up exceptions early in
just the PV-on-Xen case. For the HVM case I'd indeed rather
see us go the route we've gone for Windows, if need be.

There's one adjustment to the logic here that I've been
considering, but I'm still undecided due to the downsides:
Without exposing the value, we could make the decision to zap
the #GP dependent upon us being able to read the MSR.

The other possible adjustment would be to avoid issuing two
log messages for the same operation (affecting debug builds
only). But the code structure (which isn't really consistent
about when the pre-existing message would get issued)
doesn't directly lend itself to such an adjustment without
altering the behavior for some of the MSRs explicitly
handled.

As a tangent, while discussing this situation, please let's
not forget about this code in Linux:

static u64 xen_read_msr(unsigned int msr)
{
        /*
         * This will silently swallow a #GP from RDMSR.  It may be worth
         * changing that.
         */
        int err;

        return xen_read_msr_safe(msr, &err);
}

static void xen_write_msr(unsigned int msr, unsigned low, unsigned high)
{
        /*
         * This will silently swallow a #GP from WRMSR.  It may be worth
         * changing that.
         */
        xen_write_msr_safe(msr, low, high);
}

Imo this "silent swallowing" has always been the wrong thing
to do, and hence ought to be dropped. Of course right now it
saves the kernel from dying on the HWCR read.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.