[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Linux PV/PVH domU crash on (guest) resume from suspend
On 19.02.2021 14:18, Jürgen Groß wrote: > On 19.02.21 14:10, Jan Beulich wrote: >> On 19.02.2021 13:48, Jürgen Groß wrote: >>> On 17.02.21 14:48, Marek Marczykowski-Górecki wrote: >>>> On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote: >>>>> On 17.02.21 06:12, Marek Marczykowski-Górecki wrote: >>>>>> Hi, >>>>>> >>>>>> I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do >>>>>> this with: >>>>>> >>>>>> virsh -c xen dompmsuspend <vmname> mem >>>>>> virsh -c xen dompmwakeup <vmname> >>>>>> >>>>>> But it's possible to trigger it with plain xl too: >>>>>> >>>>>> xl save -c <vmname> <some-file> >>>>>> >>>>>> The same on HVM works fine. >>>>>> >>>>>> This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens >>>>>> with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's >>>>>> relevant here. I can reliably reproduce it. >>>>> >>>>> This is already on my list of issues to look at. >>>>> >>>>> The problem seems to be related to the XSA-332 patches. You could try >>>>> the patches I've sent out recently addressing other fallout from XSA-332 >>>>> which _might_ fix this issue, too: >>>>> >>>>> https://patchew.org/Xen/20210211101616.13788-1-jgross@xxxxxxxx/ >>>> >>>> Thanks for the patches. Sadly it doesn't change anything - I get exactly >>>> the same crash. I applied that on top of 5.11-rc7 (that's what I had >>>> handy). If you think there may be a difference with the final 5.11 or >>>> another branch, please let me know. >>>> >>> >>> Some more tests reveal that this seems to be s hypervisor regression. >>> I can reproduce the very same problem with a 4.12 kernel from 2019. >>> >>> It seems as if the EVTCHNOP_init_control hypercall is returning >>> -EINVAL when the domain is continuing to run after the suspend >>> hypercall (in contrast to the case where a new domain has been created >>> when doing a "xl restore"). >> >> But when you resume the same domain, the kernel isn't supposed to >> call EVTCHNOP_init_control, as that's a one time operation (per >> vCPU, and unless EVTCHNOP_reset was called of course). In the >> hypervisor map_control_block() has (always had) as its first step >> >> if ( v->evtchn_fifo->control_block ) >> return -EINVAL; >> >> Re-setup is needed only when resuming in a new domain. > > But the same guest will not crash when doing the same on a 4.12 > hypervisor. Is the kernel perhaps not given the bit of information anymore that it needs to tell apart the two resume modes? Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |