[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ?
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Sent: 16 February 2017 11:00 > To: Paul Durrant <Paul.Durrant@xxxxxxxxxx> > Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx> > Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for > completion handling" ? > > >>> On 16.02.17 at 11:53, <Paul.Durrant@xxxxxxxxxx> wrote: > >> -----Original Message----- > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> Sent: 16 February 2017 10:46 > >> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx> > >> Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx> > >> Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for > >> completion handling" ? > >> > >> >>> On 16.02.17 at 11:36, <Paul.Durrant@xxxxxxxxxx> wrote: > >> >> -----Original Message----- > >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> >> Sent: 16 February 2017 10:23 > >> >> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx> > >> >> Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx> > >> >> Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for > >> >> completion handling" ? > >> >> > >> >> >>> On 16.02.17 at 11:13, <Paul.Durrant@xxxxxxxxxx> wrote: > >> >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> >> >> Sent: 16 February 2017 09:21 > >> >> >> > >> >> >> as it looks to be quite non-trivial an operation, did you happen to > >> >> >> have to backport commit 480b83162a to 4.4 or older, without > >> >> >> backporting all the ioreq server stuff at the same time? It looks to > >> >> >> me as if the issue predates the addition of ioreq servers, and us > >> >> >> having had customer reports here would seem to make this a > >> >> >> candidate fix (perhaps with at least 125833f5f1 ["x86: fix > >> >> >> ioreq-server event channel vulnerability"] also backported, which > >> >> >> also appears to address a pre-existing issue). > >> >> > > >> >> > Sorry, no I don't have a back-port. Agreed that the issue existed > >> >> > prior > to > >> >> > ioreq servers but the checking was probably sufficiently lax that it > never > >> >> > resulted in a domain_crash(), just bad data coming back from an > >> emulation > >> >> > request. > >> >> > >> >> Well, according to the reports we've got, maybe it was less likely > >> >> to trigger, but it looks like it wasn't lax enough. Albeit I'm yet to > >> >> get confirmation that the issue was only seen during domain > >> >> shutdown, which aiui was (leaving aside a guest fiddling with the > >> >> shared structure, in which case it deserves being crashed) the > >> >> only condition triggering that domain_crash(). > >> > > >> > If it is only on shutdown then that's probably just a toolstack race > >> > (since > >> > QEMU should really by dying cleanly when the guest goes to S5) unless > >> we're > >> > talking about a forced shutdown. > >> > >> Then I may have misunderstood the original mail thread: Under > >> what other conditions did this trigger for the original reporters > >> (Sander and Roger)? > > > > Now you're asking... I'll have to see if I can find the original mail > > threads. It's possible it was stubdom related... but I could be thinking of > > something else. > > https://lists.xenproject.org/archives/html/xen-devel/2015- > 07/msg05210.html > Thanks. So, looking at my message https://lists.xenproject.org/archives/html/xen-devel/2015-07/msg05506.html, the problem with the emulator/toolstack was never diagnosed. I wonder whether it is a problem with running a PV aware guest in an HVM container, and using a PV shutdown mechanism causing the toolstack to kill the emulator rather than it shutting down gracefully? Prior to the ioreq series going in, the shared ioreq pages were never removed from the P2M, and so there was no concept of zeroing them before re-insertion (resulting in the ioreq state magically going straight to 'none' rather than 'resp ready'). Hence, even if the emulator were killed, you wouldn't hit the same sort of crash... more likely you'd end up with a stuck emulation and a wedged vcpu. Roger's repro was with FreeBSD which is quite PV aware AFAIK. All my prior testing was done with Windows. So, maybe this points at a problem with libxl's behaviour when a guest is shutting down? Paul > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |