[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Introduction of stable interface between Xenstore and hypervisor

To: Jan Beulich <jbeulich@xxxxxxxx>
From: Juergen Gross <jgross@xxxxxxxx>
Date: Mon, 13 Sep 2021 10:05:37 +0200
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
Delivery-date: Mon, 13 Sep 2021 08:05:47 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 13.09.21 09:39, Jan Beulich wrote:

On 10.09.2021 15:46, Juergen Gross wrote:

On 10.09.21 15:22, Jan Beulich wrote:

On 09.09.2021 08:27, Juergen Gross wrote:

Today Xenstore is using one unstable interface to retrieve the state
of a domain (existing, dying).

In order to decouple Xenstore further from the hypervisor, a stable
interface should be established. This new interface should cover the
following functionality:

- get the needed state of a domain (existing, dying)
- get a unique identifier per domain telling Xenstore that a domid
     has been reused (today Xenstore could in theory miss this)
- provide a way to avoid having to get the state of all domains when
     a state change of a domain has been signaled (there is only an event
     "some domain has changed state" today: VIRQ_DOM_EXC)

I'm suggesting the following hypervisor modifications:

- Addition of a global 64 bit domain sequence counter. This counter
     will be incremented each time a new domain is created.
- Addition of a sequence count to struct domain. It will be set to
     the value of the new global domain sequence counter when the domain
     is being created.


This looks like an "extended domain ID", which I think we want to
avoid unless exposure is extremely limited. Since we can't easily
predict what future usage this may have, I would suggest to seed the
global with a 15-bit (or wider) random value. I'd like to further
suggest considering to not have this be maintained by simply
incrementing, to make very clear to consumers that the actual value
has no other meaning. This could be done by multiplying by a random
odd number larger than 1 (determined at build or boot time). All we
want to guarantee is that numbers don't repeat (earlier than a 64-bit
value would wrap when linearly incremented).


Fine with me.

- Addition of a global bitmap with one bit for each possible domid
     (resulting in a 4 kB page). Each time the state of a domain is changed
     (domain creation, shutdown, deletion) the related bit is set.
     When the VIRQ_DOM_EXC event is being registered the bitmap is reset to
     have the bits set for all existing domains.


This page, as I understand from the subsequent item, is not intended
to be exposed to the domain running xenstored (or any other one)?


Correct.

- Addition of a new stable hypercall ("get domain state") returning the
     following information:
     + domid of a domain having the bit set in above bitmap
     + state of that domain (existing, dying)
     + sequence count of that domain
     The related bit is reset in the bitmap as a side effect of the call.


What I'd like us to consider up front is whether xenstored is going
to remain only entity interested in this kind of information. The
entire design looks to leverage that there's only a single consumer
in the system.


Right. I'm just writing some RFC patches, and I have coded this
interface to be usable only for the domain having VIRQ_DOM_EXC
registered.

The alternative (IMO) would have been to expose the domain-state
bitmap to Xenstore (and/or other interested parties).


Right, but then a simple bitmap wouldn't suffice, would it? What
would be the criteria to clear a bit there? All interested parties
would have to signal that they've observed the respective state
change.


For this kind of interface to work with multiple consumers the state
information of each domain would need to be contained in this memory
area, so this would probably require at least one byte for each domain.
Each consumer would need to keep a shadow copy of the last read
information in order to be able to detect any state modifications.

But this data would again introduce today's problem: a fast shutdown
of a domain and creation of another domain with the same domid could
get lost. So the memory area would even need to include the unique
domain identifier, increasing the size of the data per domain to at
least 8 bytes.

Further, while - like you - I'd prefer to avoid sharing the bitmap,
the question remains whether such further interested parties are
conceivable.


A good question. Today's design doesn't support multiple interested
parties at the hypervisor interface level. There can only be one
consumer of VIRQ_DOM_EXC.

I understand the caller would iterate over this hypercall. Is there
concern about this iteration never finishing, if e.g. a guest gets
rebooted quickly enough?


No. As a reboot will always include Xenstore activity, there is no
chance for that to happen.


Is this really the case? I thought that was an implementation
aspect of the tool stack(s). Take an XTF test: For it to be run
(no drivers, no qemu) is it really necessary to fiddle with
Xenstore? If from an abstract pov it isn't, then the hypervisor
should not become dependent upon such. IMO, that is.


In case you want to run without Xenstore then don't start it. In case
Xenstore is running it will receive the VIRQ_DOM_EXC events and it
will react accordingly.

The unbounded loop can happen only if you are running Xenstore, but you
are creating new guests by not using the normal Xen tools (those will
interact with Xenstore for all guests, even those without any external
references). And even in this case all you would get is quite a busy
Xenstore.

Apart from this - how would Xenstore activity prevent this loop
from becoming unbounded? Is this because you expect other
operations to be serialized with running this loop? If so, how
do you prevent starvation from this loop taking long?


Xenstore is single threaded and it is processing any external events
from a main loop. So either it is in this hypercall loop, or it is
reacting to requests, e.g. from the tools needed for domain creation.
The activity in this loop isn't very time consuming (with the new
interface it is even much less time consuming than today), so the
starvation case would more look like a hickup.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

References:
- Introduction of stable interface between Xenstore and hypervisor
  - From: Juergen Gross
- Re: Introduction of stable interface between Xenstore and hypervisor
  - From: Jan Beulich
- Re: Introduction of stable interface between Xenstore and hypervisor
  - From: Juergen Gross
- Re: Introduction of stable interface between Xenstore and hypervisor
  - From: Jan Beulich

Prev by Date: [PATCH] x86/ACPI: ignore processors which cannot be brought online
Next by Date: Re: [PATCH 08/11] libxl: Only map legacy PCI IRQs if they are supported
Previous by thread: Re: Introduction of stable interface between Xenstore and hypervisor
Next by thread: [xen-4.14-testing test] 164888: tolerable FAIL - PUSHED
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.