[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Linux PV/PVH domU crash on (guest) resume from suspend
On 19.02.21 14:37, Jan Beulich wrote: On 19.02.2021 14:18, Jürgen Groß wrote:On 19.02.21 14:10, Jan Beulich wrote:On 19.02.2021 13:48, Jürgen Groß wrote:On 17.02.21 14:48, Marek Marczykowski-Górecki wrote:On Wed, Feb 17, 2021 at 07:51:42AM +0100, Jürgen Groß wrote:On 17.02.21 06:12, Marek Marczykowski-Górecki wrote:Hi, I'm observing Linux PV/PVH guest crash when I resume it from sleep. I do this with: virsh -c xen dompmsuspend <vmname> mem virsh -c xen dompmwakeup <vmname> But it's possible to trigger it with plain xl too: xl save -c <vmname> <some-file> The same on HVM works fine. This is on Xen 4.14.1, and with guest kernel 5.4.90, the same happens with 5.4.98. Dom0 kernel is the same, but I'm not sure if that's relevant here. I can reliably reproduce it.This is already on my list of issues to look at. The problem seems to be related to the XSA-332 patches. You could try the patches I've sent out recently addressing other fallout from XSA-332 which _might_ fix this issue, too: https://patchew.org/Xen/20210211101616.13788-1-jgross@xxxxxxxx/Thanks for the patches. Sadly it doesn't change anything - I get exactly the same crash. I applied that on top of 5.11-rc7 (that's what I had handy). If you think there may be a difference with the final 5.11 or another branch, please let me know.Some more tests reveal that this seems to be s hypervisor regression. I can reproduce the very same problem with a 4.12 kernel from 2019. It seems as if the EVTCHNOP_init_control hypercall is returning -EINVAL when the domain is continuing to run after the suspend hypercall (in contrast to the case where a new domain has been created when doing a "xl restore").But when you resume the same domain, the kernel isn't supposed to call EVTCHNOP_init_control, as that's a one time operation (per vCPU, and unless EVTCHNOP_reset was called of course). In the hypervisor map_control_block() has (always had) as its first step if ( v->evtchn_fifo->control_block ) return -EINVAL; Re-setup is needed only when resuming in a new domain.But the same guest will not crash when doing the same on a 4.12 hypervisor.Is the kernel perhaps not given the bit of information anymore that it needs to tell apart the two resume modes? Ah, yes, this might be the problem. EVTCHNOP_init_control is indeed used only if the suspend hypercall did return 0. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |