[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] VPMU interrupt unreliability

To: Kyle Huey <me@xxxxxxxxxxxx>, Meng Xu <xumengpanda@xxxxxxxxx>
From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Date: Thu, 19 Oct 2017 19:38:04 +0100
Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Robert O'Callahan <robert@xxxxxxxxxxxxx>
Delivery-date: Thu, 19 Oct 2017 18:38:19 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 19/10/17 19:24, Kyle Huey wrote:
> On Thu, Oct 19, 2017 at 11:20 AM, Meng Xu <xumengpanda@xxxxxxxxx> wrote:
>> On Thu, Oct 19, 2017 at 11:40 AM, Andrew Cooper
>> <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 19/10/17 16:09, Kyle Huey wrote:
>>>> On Wed, Oct 11, 2017 at 7:09 AM, Boris Ostrovsky
>>>> <boris.ostrovsky@xxxxxxxxxx> wrote:
>>>>> On 10/10/2017 12:54 PM, Kyle Huey wrote:
>>>>>> On Mon, Jul 24, 2017 at 9:54 AM, Kyle Huey <me@xxxxxxxxxxxx> wrote:
>>>>>>> On Mon, Jul 24, 2017 at 8:07 AM, Boris Ostrovsky
>>>>>>> <boris.ostrovsky@xxxxxxxxxx> wrote:
>>>>>>>>>> One thing I noticed is that the workaround doesn't appear to be
>>>>>>>>>> complete: it is only checking PMC0 status and not other counters 
>>>>>>>>>> (fixed
>>>>>>>>>> or architectural). Of course, without knowing what the actual problem
>>>>>>>>>> was it's hard to say whether this was intentional.
>>>>>>>>> handle_pmc_quirk appears to loop through all the counters ...
>>>>>>>> Right, I didn't notice that it is shifting MSR_CORE_PERF_GLOBAL_STATUS
>>>>>>>> value one by one and so it is looking at all bits.
>>>>>>>>
>>>>>>>>>>> 2. Intercepting MSR loads for counters that have the workaround
>>>>>>>>>>> applied and giving the guest the correct counter value.
>>>>>>>>>> We'd have to keep track of whether the counter has been reset (by the
>>>>>>>>>> quirk) since the last MSR write.
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>>> 3. Or perhaps even changing the workaround to disable the PMI on 
>>>>>>>>>>> that
>>>>>>>>>>> counter until the guest acks via GLOBAL_OVF_CTRL, assuming that 
>>>>>>>>>>> works
>>>>>>>>>>> on the relevant hardware.
>>>>>>>>>> MSR_CORE_PERF_GLOBAL_OVF_CTRL is written immediately after the quirk
>>>>>>>>>> runs (in core2_vpmu_do_interrupt()) so we already do this, don't we?
>>>>>>>>> I'm suggesting waiting until the *guest* writes to the (virtualized)
>>>>>>>>> GLOBAL_OVF_CTRL.
>>>>>>>> Wouldn't it be better to wait until the counter is reloaded?
>>>>>>> Maybe!  I haven't thought through it a lot.  It's still not clear to
>>>>>>> me whether MSR_CORE_PERF_GLOBAL_OVF_CTRL actually controls the
>>>>>>> interrupt in any way or whether it just resets the bits in
>>>>>>> MSR_CORE_PERF_GLOBAL_STATUS and acking the interrupt on the APIC is
>>>>>>> all that's required to reenable it.
>>>>>>>
>>>>>>> - Kyle
>>>>>> I wonder if it would be reasonable to just remove the workaround
>>>>>> entirely at some point.  The set of people using 1) several year old
>>>>>> hardware, 2) an up to date Xen, and 3) the off-by-default performance
>>>>>> counters is probably rather small.
>>>>> We'd probably want to only enable this for affected processors, not
>>>>> remove it outright. But the problem is that we still don't know for sure
>>>>> whether this issue affects NHM only, do we?
>>>>>
>>>>> (https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg02242.html
>>>>> is the original message)
>>>> Yes, the basic problem is that we don't know where to draw the line.
>>> vPMU is disabled by default for security reasons,
>>
>> Is there any document about the possible attack via the vPMU? The
>> document I found (such as [1] and XSA-163) just briefly say that the
>> vPMU should be disabled due to security concern.
>>
>>
>> [1] https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html
> Cross-guest information leaks, presumably.

Plenty of "not context switching things properly".

Off the top of my head, there was also a straight DoS by blindly passing
guest values into an unchecked wrmsr(), and privilege escalation via
letting the guest choose where ds_store dumped its data.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

References:
- Re: [Xen-devel] VPMU interrupt unreliability
  - From: Kyle Huey
- Re: [Xen-devel] VPMU interrupt unreliability
  - From: Boris Ostrovsky
- Re: [Xen-devel] VPMU interrupt unreliability
  - From: Kyle Huey
- Re: [Xen-devel] VPMU interrupt unreliability
  - From: Andrew Cooper
- Re: [Xen-devel] VPMU interrupt unreliability
  - From: Meng Xu
- Re: [Xen-devel] VPMU interrupt unreliability
  - From: Kyle Huey

Prev by Date: Re: [Xen-devel] [PATCH RFC 09/14] xen: vmx: Introduce a Hyper call to set subpage
Next by Date: Re: [Xen-devel] [Qemu-devel] [PATCH v5 0/8] xen: xen-domid-restrict improvements
Previous by thread: Re: [Xen-devel] VPMU interrupt unreliability
Next by thread: Re: [Xen-devel] VPMU interrupt unreliability
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.