Re: Linux PV/PVH domU crash on (guest) resume from suspend

On 19.02.2021 13:48, Jürgen Groß wrote:
> On 17.02.21 14:48, Marek Marczykowski-Górecki wrote:
>> On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:
>>> On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
>>>> Hi,
>>>> I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
>>>> this with:
>>>>       virsh -c xen dompmsuspend <vmname> mem
>>>>       virsh -c xen dompmwakeup <vmname>
>>>> But it's possible to trigger it with plain xl too:
>>>>       xl save -c <vmname> <some-file>
>>>> The same on HVM works fine.
>>>> This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
>>>> with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
>>>> relevant here. I can reliably reproduce it.
>>> This is already on my list of issues to look at.
>>> The problem seems to be related to the XSA-332 patches. You could try
>>> the patches I've sent out recently addressing other fallout from XSA-332
>>> which _might_ fix this issue, too:
>>> https://patchew.org/Xen/20210211101616.13788-1-jgross@xxxxxxxx/
>> Thanks for the patches. Sadly it doesn't change anything - I get exactly
>> the same crash. I applied that on top of 5.11-rc7 (that's what I had
>> handy). If you think there may be a difference with the final 5.11 or
>> another branch, please let me know.
> Some more tests reveal that this seems to be s hypervisor regression.
> I can reproduce the very same problem with a 4.12 kernel from 2019.
> It seems as if the EVTCHNOP_init_control hypercall is returning
> -EINVAL when the domain is continuing to run after the suspend
> hypercall (in contrast to the case where a new domain has been created
> when doing a "xl restore").

But when you resume the same domain, the kernel isn't supposed to
call EVTCHNOP_init_control, as that's a one time operation (per
vCPU, and unless EVTCHNOP_reset was called of course). In the
hypervisor map_control_block() has (always had) as its first step

    if ( v->evtchn_fifo->control_block )
        return -EINVAL;

Re-setup is needed only when resuming in a new domain.




