[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Linux PV/PVH domU crash on (guest) resume from suspend



On 19.02.2021 14:18, Jürgen Groß wrote:
> On 19.02.21 14:10, Jan Beulich wrote:
>> On 19.02.2021 13:48, Jürgen Groß wrote:
>>> On 17.02.21 14:48, Marek Marczykowski-Górecki wrote:
>>>> On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:
>>>>> On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do
>>>>>> this with:
>>>>>>
>>>>>>        virsh -c xen dompmsuspend <vmname> mem
>>>>>>        virsh -c xen dompmwakeup <vmname>
>>>>>>
>>>>>> But it's possible to trigger it with plain xl too:
>>>>>>
>>>>>>        xl save -c <vmname> <some-file>
>>>>>>
>>>>>> The same on HVM works fine.
>>>>>>
>>>>>> This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens
>>>>>> with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's
>>>>>> relevant here. I can reliably reproduce it.
>>>>>
>>>>> This is already on my list of issues to look at.
>>>>>
>>>>> The problem seems to be related to the XSA-332 patches. You could try
>>>>> the patches I've sent out recently addressing other fallout from XSA-332
>>>>> which _might_ fix this issue, too:
>>>>>
>>>>> https://patchew.org/Xen/20210211101616.13788-1-jgross@xxxxxxxx/
>>>>
>>>> Thanks for the patches. Sadly it doesn't change anything - I get exactly
>>>> the same crash. I applied that on top of 5.11-rc7 (that's what I had
>>>> handy). If you think there may be a difference with the final 5.11 or
>>>> another branch, please let me know.
>>>>
>>>
>>> Some more tests reveal that this seems to be s hypervisor regression.
>>> I can reproduce the very same problem with a 4.12 kernel from 2019.
>>>
>>> It seems as if the EVTCHNOP_init_control hypercall is returning
>>> -EINVAL when the domain is continuing to run after the suspend
>>> hypercall (in contrast to the case where a new domain has been created
>>> when doing a "xl restore").
>>
>> But when you resume the same domain, the kernel isn't supposed to
>> call EVTCHNOP_init_control, as that's a one time operation (per
>> vCPU, and unless EVTCHNOP_reset was called of course). In the
>> hypervisor map_control_block() has (always had) as its first step
>>
>>      if ( v->evtchn_fifo->control_block )
>>          return -EINVAL;
>>
>> Re-setup is needed only when resuming in a new domain.
> 
> But the same guest will not crash when doing the same on a 4.12
> hypervisor.

Is the kernel perhaps not given the bit of information anymore that
it needs to tell apart the two resume modes?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.