[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Linux PV/PVH domU crash on (guest) resume from suspend



On 19.02.21 14:37, Jan Beulich wrote:
On 19.02.2021 14:18, Jürgen Groß wrote:
On 19.02.21 14:10, Jan Beulich wrote:
On 19.02.2021 13:48, Jürgen Groß wrote:
On 17.02.21 14:48, Marek Marczykowski-Górecki wrote:
On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:
On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
Hi,

I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
this with:

        virsh -c xen dompmsuspend <vmname> mem
        virsh -c xen dompmwakeup <vmname>

But it's possible to trigger it with plain xl too:

        xl save -c <vmname> <some-file>

The same on HVM works fine.

This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
relevant here. I can reliably reproduce it.

This is already on my list of issues to look at.

The problem seems to be related to the XSA-332 patches. You could try
the patches I've sent out recently addressing other fallout from XSA-332
which _might_ fix this issue, too:

https://patchew.org/Xen/20210211101616.13788-1-jgross@xxxxxxxx/

Thanks for the patches. Sadly it doesn't change anything - I get exactly
the same crash. I applied that on top of 5.11-rc7 (that's what I had
handy). If you think there may be a difference with the final 5.11 or
another branch, please let me know.


Some more tests reveal that this seems to be s hypervisor regression.
I can reproduce the very same problem with a 4.12 kernel from 2019.

It seems as if the EVTCHNOP_init_control hypercall is returning
-EINVAL when the domain is continuing to run after the suspend
hypercall (in contrast to the case where a new domain has been created
when doing a "xl restore").

But when you resume the same domain, the kernel isn't supposed to
call EVTCHNOP_init_control, as that's a one time operation (per
vCPU, and unless EVTCHNOP_reset was called of course). In the
hypervisor map_control_block() has (always had) as its first step

      if ( v->evtchn_fifo->control_block )
          return -EINVAL;

Re-setup is needed only when resuming in a new domain.

But the same guest will not crash when doing the same on a 4.12
hypervisor.

Is the kernel perhaps not given the bit of information anymore that
it needs to tell apart the two resume modes?

Ah, yes, this might be the problem.

EVTCHNOP_init_control is indeed used only if the suspend hypercall did
return 0.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.