[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for-4.12] libxl: When restricted, start QEMU paused



On Wed, Jan 30, 2019 at 03:09:45PM +0000, Ian Jackson wrote:
> Anthony PERARD writes ("[PATCH for-4.12] libxl: When restricted, start QEMU 
> paused"):
> > [stuff]
> 
> Thanks for this.  I think the code looks right but to make it easier
> to understand what was going on I have taken the liberty of trying to
> reword your commit message for English grammar.
> 
> Can you please check that this is true ?
> 
>   libxl runs the command "cont" later during guest creation; i.e. it
>   is expecting that QEMU would not do any emulation.  Use the "-S"
>   command option to achieve this.
> 
>   Unfortunately, when QEMU is started with "-S", it won't write QEMU's
>   readiness into xenstore. So only activate this option when we have a
>   QEMU startup notification via QMP available, i.e. when dm_restrict
>   is activated.
> 
>   The -S option has the side-effect of suppressing the startup
>   notification via xenstore: libxl will only get the notification via
>   QMP.
> 
>   It is important to rely only on QMP for notification when we have
>   QMP available, as (due to a qemu bug) not waiting for that QMP
>   notification may result in the QMP socket becoming blocked, so that
>   QEMU stops responding to new connections even if no existing ones
>   are active.
> 
>   The QEMU bug that this patch works around is as follows:
>   - libxl connects and hand-checks [xxx???] with QEMU, then sends the
>     cmd "query-status".
>   - QEMU prepares and maybe tries to send the response,
>     while also writing "running" into xenstore.
>   - libxl sees via xenstore that QEMU is running and disconnects from the
>     QMP socket before receiving the response from the cmd.
>   => The QMP socket (monitor) is thereby blocked and will never reply
>     to commands on new connections.
> 
>   This is due to QEMU only responding to one command at a time, and
>   suspending its monitor (QMP) until the command as been processed and
>   sent. Disconnecting from the socket doesn't unsuspend the monitor. The
>   race described here is very likely to happen with QEMU 3.1.50 (during
>   3.2 development), but can be reproduced with QEMU 3.1.
> 
>   [xxx??? So, require, therefore, that when we get the QMP readiness
>   notification, the qemu state is xenstore.]

Sorry for this been vague. I'm not 100% sure of what QEMU do and when.
The listing of QEMU's action are probably not accurate.

QEMU probably write the "running" state to xenstore before starting to
handle the QMP connection (respond to command). I'm unsure of when QEMU
does handle QMP, but that's probably in the main_loop(), which is
started after the piece of code that write to xenstore.

Or maybe the issue here is: s/hand-checks/handshake/, I was trying to
describe the context with as little words as possible before
getting to the point where things fails.

But here is a little more detail of libxl's action, when the bug happen
(and before that patch is applied):
- connect(qmp_socket)
. wait (for event, like something to read or xenstore updates)
- receive QMP greeting, then send 'qmp_capabilities' command
. wait
- receive response to qmp_capability, then send 'query-status'
. wait
- receive update from xenstore: qemu's state=runnning
  then close connection to qmp_socket and keep going with domain
  creation

Is that better to understand the context? How much of this would
actally be useful for the patch description?

> I added a couple of xxx where I was particularly unsure.

Thanks,

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.