[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Alejandro Vallejo <alejandro.garciavallejo@xxxxxxx>
  • Date: Wed, 10 Sep 2025 19:01:28 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=suse.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uHE1RrkGJDNNNV5lHGYmpuLxiJIwY665xOrzVH6SXlw=; b=CqeTuu2gXw7PfgFqOtotaR0UhfW+7/Rs4lFOZbJyEsw0Sy86US/tmlEESka4ru8QCQBPy6Y0HjrKJ9MentPiPRvi1skaORXiV8F4Js5hFaBsjOGh9FVFVZBsA8H1X0QmMvlKRvZ8VDVTeQRO8O7lZdpItGFBMWD2tb+C53pJ0byPXEMS+4DAPZwYwWCSmVK9/R/fBmBDCdTd7AKiZYdh6xalxKoLM5jcOFlXEJJjZX2st3YQEMYFt7b66W97bHfaVm8c81LFYIB1yZqKHN4R41V3OHH23fnvLIRHUoxTp5O4It5+zXEPznrAt0Pphrpz85Ch2pZ9uH4tziGWLBFLgw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=J6B48KycYOrf92SpeXKFzxnuxTXMalqYYYkeCUNPqOYtzGmiBk5ES4dWRKzAawfiP30p8oL7zbo8nu88/9J80eLCZhtvqRtEl+Iy71x1+qf2OMi7vTpmPkhxUQev9/wK0rHcEr1988ymHBIV34eNulxYJ5P/bShgNoyOOPUxWr7KLTkbIOrmRnkX5Nk/VE8WpjVFEqt9LKJ6pP8iLUg9W46tSNYkA5LnGL6wyYrYPVBadgmmD7ctG+jyQC+DK01JTdDgCLV6DyjNBO0X/Whak+/HNRcVqK+B94XUDD0262sYfLnzg3d7SnjR/m9Jj4VziASg9AHgQL+k7id3zo9CVQ==
  • Cc: Anthony PERARD <anthony.perard@xxxxxxxxxx>, Grygorii Strashko <grygorii_strashko@xxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 10 Sep 2025 17:02:31 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed Sep 10, 2025 at 5:31 PM CEST, Jan Beulich wrote:
> On 10.09.2025 17:16, Alejandro Vallejo wrote:
>> On Wed Sep 10, 2025 at 5:02 PM CEST, Jan Beulich wrote:
>>> On 10.09.2025 16:49, Alejandro Vallejo wrote:
>>>> CPU hotplug relies on the guest having access to the legacy online CPU
>>>> bitmap that QEMU provides at PIO 0xAF00. But PVH guests have no DM, so
>>>> this causes the MADT to get corrupted due to spurious modifications of
>>>> the "online" flag in MADT entries and the table checksum during the
>>>> initial acpica passes.
>>>
>>> I don't understand this MADT corruption aspect, which - aiui - is why
>>> there's a Fixes: tag here. The code change itself looks plausible.
>> 
>> When there's no DM to provide a real and honest online CPU bitmap on PIO 
>> 0xAF00
>> then we get all 1s (because there's no IOREQ server). Which confuses the GPE
>> handler.
>> 
>> Somehow, the GPE handler is being triggered. Whether this is due to a real 
>> SCI
>> or just it being spuriously executed as part of the initial acpica pass, I 
>> don't
>> know.
>> 
>> Both statements combined means the checksum and online flags in the MADT get
>> changed after initial parsing making it appear as-if all 128 CPUs were 
>> plugged.
>
> I can follow this part (the online flags one, that is).
>
>> This patch makes the checksums be correct after acpica init.
>
> I'm still in trouble with this one. If MADT is modified in the process, 
> there's
> only one of two possible options:
> 1) It's expected for the checksum to no longer be correct.
> 2) The checksum is being fixed up in the process.
> That's independent of being HVM or PVH and independent of guest boot or later.
> (Of course there's a sub-variant of 2, where the adjusting of the checksum
> would be broken, but that wouldn't be covered by your change.)
>
> Jan

I see what you mean now. The checksum correction code LOOKS correct. But I
wonder about the table length... We report a table as big as it needs to be,
but the checksum update is done irrespective of FLG being inside the valid range
of the MADT. If a guest with 2 vCPUs (in max_vcpus) sees vCPU127 being signalled
that'd trigger the (unseen) online flag to be enabled and the checksum adjusted,
except the checksum must not being adjusted.

I could add even more AML to cover that, but that'd be QEMU misbehaving (or
being absent). This patch covers the latter case, but it might be good to
change the commit message to reflect the real problem.

Cheers,
Alejandro



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.