[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: x86/vmx: Don't spuriously crash the domain when INIT is received


  • To: "Beulich, Jan" <JBeulich@xxxxxxxx>
  • From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
  • Date: Thu, 17 Mar 2022 05:57:57 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=smKKZUagFrMqDo1NtoNicuK9dofLgfAbLKYfHZGANkU=; b=On8s5Ep7tKtDTwOYab80Hu1hy6hW2/1ltWtM+nPE2+cqNmu4FETgBQslRNBuaukOzsBZH3ZNEan/vn5gvIHbEZCK/nGJVcUDP6LsDL7QsaqgNsZIDKqo4E3ZnsVeg+wBzhVATKVWyvk9ZqxgbY0KNV7MKudaISMve0MtjiVceCjkMMgmzYBZ+jgUdL4YXFFs81IVCPhJhhhnfGzSatI3RtrPAqBeB+iOHK/Tr3qOoH0sUwha//6Dvgl7J9bYvtALKiC16afb6LfaHjeC0o2q9PAcjULXeDbVnS1tA4bhgxoMqy2Hpqz1ZPuxZFyvks8DRJTo32jbjUKDV5/uS2oJOQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NjbiFmJPW/w3a4VE5zy69IExqH5Pul4QqYtObGxaWUZ98yNdnEHkmqUoFmNW108V7/WyC8Z26wI23yOZL9pNSwkXCzexNw4Y4tStmQApzhy6ccnGS77DIvKj43zz+V31Wm417ntQRXT/PcQx6t1FWeVuUyDE6DwP8ges/0UCiByDt6/GfUe7eyVmjXTxI29eXHF3llF8o5OSFPeL0Cx0V6UcOtJret+LBKPKZcdH8k4TmSXy54qu63pd2Q2E9Syvs2wGmjOEiLGGcDwdEuNnz+FB0WM3ztRUCcHh+v9qkftZMqMHA0RzDBF7ZhEdsSTmxkOWN6zYcHG72cU2uvXlSA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com;
  • Cc: Pau Monné, Roger <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>, Thiner Logoer <logoerthiner1@xxxxxxx>, "Marczykowski, Marek" <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Cooper, Andrew" <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Thu, 17 Mar 2022 05:58:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYKbevPQACrsDTpUmflGdlWuTEmayj9FgAgAA+nwCAAA5IgIAAQNmAgAQWIgCAFe3v8IAAFMoAgASY4xA=
  • Thread-topic: x86/vmx: Don't spuriously crash the domain when INIT is received

> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: Monday, March 14, 2022 3:43 PM
> 
> On 14.03.2022 07:35, Tian, Kevin wrote:
> >> From: Jan Beulich <jbeulich@xxxxxxxx>
> >> Sent: Monday, February 28, 2022 3:36 PM
> >>
> >> On 25.02.2022 18:11, Andrew Cooper wrote:
> >>> On 25/02/2022 13:19, Jan Beulich wrote:
> >>>> On 25.02.2022 13:28, Andrew Cooper wrote:
> >>>>> On 25/02/2022 08:44, Jan Beulich wrote:
> >>>>>> On 24.02.2022 20:48, Andrew Cooper wrote:
> >>>>>>> In VMX operation, the handling of INIT IPIs is changed.
> >> EXIT_REASON_INIT has
> >>>>>>> nothing to do with the guest in question, simply signals that an INIT
> >> was
> >>>>>>> received.
> >>>>>>>
> >>>>>>> Ignoring the INIT is probably the wrong thing to do, but is helpful 
> >>>>>>> for
> >>>>>>> debugging.  Crashing the domain which happens to be in context is
> >> definitely
> >>>>>>> wrong.  Print an error message and continue.
> >>>>>>>
> >>>>>>> Discovered as collateral damage from when an AP triple faults on S3
> >> resume on
> >>>>>>> Intel TigerLake platforms.
> >>>>>> I'm afraid I don't follow the scenario, which was (only) outlined in
> >>>>>> patch 1: Why would the BSP receive INIT in this case?
> >>>>> SHUTDOWN is a signal emitted by a core when it can't continue.  Triple
> >>>>> fault is one cause, but other sources include a double #MC, etc.
> >>>>>
> >>>>> Some external component, in the PCH I expect, needs to turn this into
> a
> >>>>> platform reset, because one malfunctioning core can't.  It is why a
> >>>>> triple fault on any logical processor brings the whole system down.
> >>>> I'm afraid this doesn't answer my question. Clearly the system didn't
> >>>> shut down.
> >>>
> >>> Indeed, *because* Xen caught and ignored the INIT which was otherwise
> >>> supposed to do it.
> >>>
> >>>>  Hence I still don't see why the BSP would see INIT in the
> >>>> first place.
> >>>>
> >>>>>> And it also cannot be that the INIT was received by the vCPU while
> >> running on
> >>>>>> another CPU:
> >>>>> It's nothing (really) to do with the vCPU.  INIT is a external signal to
> >>>>> the (real) APIC, just like NMI/etc.
> >>>>>
> >>>>> It is the next VMEntry on a CPU which received INIT that suffers a
> >>>>> VMEntry failure, and the VMEntry failure has nothing to do with the
> >>>>> contents of the VMCS.
> >>>>>
> >>>>> Importantly for Xen however, this isn't applicable for scheduling PV
> >>>>> vCPUs, which is why dom0 wasn't the one that crashed.  This actually
> >>>>> meant that dom0 was alive an usable, albeit it sharing all vCPUs on a
> >>>>> single CPU.
> >>>>>
> >>>>>
> >>>>> The change in INIT behaviour exists for TXT, where is it critical that
> >>>>> software can clear secrets from RAM before resetting.  I'm not wanting
> >>>>> to get into any of that because it's far more complicated than I have
> >>>>> time to fix.
> >>>> I guess there's something hidden behind what you say here, like INIT
> >>>> only being latched, but this latched state then causing the VM entry
> >>>> failure. Which would mean that really the INIT was a signal for the
> >>>> system to shut down / shutting down.
> >>>
> >>> Yes.
> >
> > why is INIT latched in root mode (take effect until vmentry) instead of
> > directly causing the CPU to reset?
> >
> >>>
> >>>> In which case arranging to
> >>>> continue by ignoring the event in VMX looks wrong. Simply crashing
> >>>> the guest would then be wrong as well, of course. We should shut
> >>>> down instead.
> >>>
> >>> It is software's discretion what to do when an INIT is caught, even if
> >>> the expectation is to honour it fairly promptly.
> >>>
> >>>> But I don't think I see the full picture here yet, unless your
> >>>> mentioning of TXT was actually implying that TXT was active at the
> >>>> point of the crash (which I don't think was said anywhere).
> >>>
> >>> This did cause confusion during debugging.  As far as we can tell, TXT
> >>> is not active, but the observed behaviour certainly looks like TXT is
> >>> active.
> >>>
> >>> Then again, reset is a platform behaviour, not architectural.  Also,
> >>> it's my understanding that Intel does not support S3 on TigerLake
> >>> (opting to only support S0ix instead), so I'm guessing that "Linux S3"
> >>> as it's called in the menu is something retrofitted by the OEM.
> >>>
> >>> But overall, the point isn't really about what triggered the INIT.  We
> >>> also shouldn't nuke an innocent VM if an INIT IPI slips through
> >>> interrupt remapping.
> >>
> >> But we also shouldn't continue in such a case as if nothing had happened
> >> at all, should we?
> >>
> >
> > Now there are two problems:
> >
> > 1) An innocent VM is killed;
> > 2) The system continues as if nothing had happened;
> >
> > Andrew's patch fixes 1) which imo is welcomed anyway.
> >
> > 2) certainly needs more work but could come after 1).
> 
> That's one way to look at things, sure, and if you agree it makes sense
> to address 1), I won't go as far as trying to block such a change. But
> it feels wrong to me - 2) working properly really includes 1) plus the
> killing of all other innocent VMs that were running at the time.
> 

I feel that 2) will be done in a way that the admin can choose the
policy. It could be killing all VMs or in a mode where further 
diagnose is allowed. Given that part is unclear at this point, I'm
inclined to ack 1) first:

Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx>

Thanks
Kevin

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.