[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 5/5] tools/xenstore: add migration stream extensions for new features



On 04.08.22 21:28, Julien Grall wrote:
Hi,

On 03/08/2022 12:59, Juergen Gross wrote:
Extend the definition of the Xenstore migration stream to cover new
features:

- per domain features
- extended watches (watch depth)
- per domain quota

Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
---
V3:
- new patch
---
  docs/designs/xenstore-migration.md | 85 ++++++++++++++++++++++++++++--
  1 file changed, 82 insertions(+), 3 deletions(-)

diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
index efa526f420..b2b1d3d5c7 100644
--- a/docs/designs/xenstore-migration.md
+++ b/docs/designs/xenstore-migration.md
@@ -43,7 +43,13 @@ the setting of the endianness bit.
  |-----------|---------------------------------------------------|
  | `ident`   | 0x78656e73746f7265 ('xenstore' in ASCII)          |
  |           |                                                   |
-| `version` | 0x00000001 (the version of the specification)     |
+| `version` | The version of the specification, defined values: |
+|           | 0x00000001: all fields without any explicitly     |
+|           |             mentioned version dependency are      |
+|           |             valid.                                |
+|           | 0x00000002: all fields valid for version 1 plus   |
+|           |             fields explicitly stated to be        |
+|           |             supported in version 2 are valid.     |

I am a bit concerned with the bump of the versions. It means, it will be necessary for Xenstored to know whether the new Xenstored speaks v1 or v2. This is less an issue when Live-Migration (although there is a fleet management problem) but it will be one for Live-Update if we need to rollback.

So I am wondering if we can avoid to bump the version and use other means to detect the difference.

In the end this is exactly what the version was meant to be used for.

I think it would make much more sense to think about the way to handle a
bump of the version in a compatible way.

My idea was to add a xenstored command line parameter for limiting the
migration stream version to be used to a specified version, causing new
features probably to not be available, though.

I don't see how e.g. a rollback would be doable in case a domain already
started to use a new feature like the third parameter when setting a watch.
Even if we'd drop the depth information when rolling back a watch set
afterwards with an additional depth added would be rejected by the older
xenstored, which would be unexpected failure for the guest.

It might make sense to try to use the V1 stream when doing a live update,
e.g. covering the case when the SET_FEATURE command was used for each
active guest to limit the features to V1 compatible ones. A force parameter
might be used to use the V1 stream even if guests are using V2 features,
risking breakage of those guests.


  |           |                                                   |
  | `flags`   | 0 (LSB): Endianness: 0 = little, 1 = big          |
  |           |                                                   |
@@ -117,7 +123,17 @@ xenstored state that needs to be restored.
  | rw-socket-fd                  |
  +-------------------------------+
  | evtchn-fd                     |
++---------------+---------------+
+| n-dom-quota   | n-glob-quota  |
++---------------+---------------+
+| quota-val 1                   |
++-------------------------------+
+...
  +-------------------------------+
+| quota-val N                   |
++-------------------------------+
+| quota-names
+...
  ```
@@ -128,6 +144,22 @@ xenstored state that needs to be restored.
  |                |                                              |
  | `evtchn-fd`    | The file descriptor used to communicate with |
  |                | the event channel driver                     |
+|                |                                              |
+| `n-dom-quota`  | Number of quota values which apply per       |
+|                | domain. Valid only for version 2 and later.  |
+|                |                                              |

I think we can avoid extending the structure here by creating a separate record for the quota.

With that, the new Xenstored don't need specific code to deal with rollback because, as AFAICT, unimplemented records are ignored by existing Xenstored.

For quota this would be possible regarding the functionality, but there are
use cases which might suffer in case quota are not being accepted by the
new instance (e.g. driver domains needing higher quota).

OTOH this is no guest visible interface, so I'm on the edge to add this
data to V1. This would even be possible using the global record of V1, as
the length information of the record allows to add new fields without
having to bump the version.


+| `n-glob-quota` | Number of quota values which apply globally  |
+|                | only. Valid only for version 2 and later.    |
+|                |                                              |
+| `quota-val`    | Quota values, first the ones applying per    |
+|                | domain, then the ones applying globally. A   |
+|                | value of 0 has the semantics of "unlimited". |
+|                | Valid only for version 2 and later.          |
+|                |                                              |
+| `quota-names`  | 0 delimited strings of the quota names in    |
+|                | the same sequence as the `quota-val` values. | > +|                | Valid only for version 2 and later.          |

From my understanding, both version of Xenstored needs to agree on the quota names. So it means the name have to be defined as part of the spec. At which point, I think it would be better to use ID.

I don't think so. For one the minimal set of quota names has been defined
already in patch 3. And even with using an ID you'd have the same problem
again, but without having the possibility to add variant specific quota
(remember that there already has been a statement that doing a live update
from C to OCAML or vice versa would probably break users due to some
deviations in behavior).

Also, can you clarify what would happen if the stream contains a quota not supported by the new Xenstored?

Yes, I'll add a sentence that those should be ignored.


+
  xenstored will resume in the original process context. Hence `rw-socket-fd`
  simply specifies the file descriptor of the socket. Sockets are not always
@@ -145,7 +177,7 @@ the domain being migrated.
  ```
      0       1       2       3       4       5       6       7    octet
  +-------+-------+-------+-------+-------+-------+-------+-------+
-| conn-id                       | conn-type     |               |
+| conn-id                       | conn-type     | n-quota       |
  +-------------------------------+---------------+---------------+
  | conn-spec
  ...
@@ -154,6 +186,17 @@ the domain being migrated.
  +---------------+---------------+-------------------------------+
  | data
  ...
++-------------------------------+
+| features                      |
++-------------------------------+
+| quota-val 1                   |
++-------------------------------+
+...
++-------------------------------+
+| quota-val N                   |
++-------------------------------+
+| quota-names
+...
  ```
@@ -167,6 +210,10 @@ the domain being migrated.
  |                | 0x0001: socket                               |
  |                | 0x0002 - 0xFFFF: reserved for future use     |
  |                |                                              |
+| `n-quota`      | Number of quota values.                      |
+|                | Only for `conn-type` 0 (shared ring).        |
+|                | Only valid for version 2 and later.          |
+|                |                                              |
  | `conn-spec`    | See below                                    |
  |                |                                              |
  | `in-data-len`  | The length (in octets) of any data read      |
@@ -182,6 +229,22 @@ the domain being migrated.
  | `data`         | Pending data: first in-data-len octets of    |
  |                | read data, then out-data-len octets of       |
  |                | written data (any of both may be empty)      |
+|                |                                              |
+| `features`     | Value of the feature field visible by the    |
+|                | guest at offset 2064 of the ring page.       |
+|                | Aligned to the next 4 octet boundary.        |
+|                | Only for `conn-type` 0 (shared ring).        |

For the purpose of the stream, I would consider to make it available for the socket connection. This could potentially be used in the future to allow each application to have a different behavior when socket is used.

This would break the use of xenstore-stubdom for such a setup.

I can't make my mind yet if we can avoid bumping the version for this field. What would happen if we need to rollback?

I think an active usage of the new features and a rollback are mutually
exclusive. See above.


+|                | Only valid for version 2 and later.          | > +|                |                                              |
+| `quota-val`    | Quota values, a value of 0 has the semantics |
+|                | "unlimited".                                 |
+|                | Only for `conn-type` 0 (shared ring).        |
+|                | Only valid for version 2 and later.          |

I would suggest to be very obvious and say the value will override the value (if any)

Okay.


+|                |                                              |
+| `quota-names`  | 0 delimited strings of the quota names in    |
+|                | the same sequence as the `quota-val` values. |
+|                | Only for `conn-type` 0 (shared ring).        |
+|                | Only valid for version 2 and later.          |

As for the "global" quotas, I would move the quotas in a separate record. In this case, this would also be useful to avoid having may dynamic length field within the same record.

I like having the data together more.

But I'm not really feeling strong about that.


  In case of live update the connection record for the connection via which
  the live update command was issued will contain the response for the live
@@ -247,7 +310,7 @@ by a connection for which there is `CONNECTION_DATA` record previously present.
  ```
      0       1       2       3    octet
-+-------+-------+-------+-------+
++---------------+---------------+
  | conn-id                       |
  +---------------+---------------+
  | wpath-len     | token-len     |
@@ -256,6 +319,9 @@ by a connection for which there is `CONNECTION_DATA` record previously present.
  ...
  | token
  ...
++---------------+---------------+
+| depth         |               |
++---------------+---------------+
  ```
@@ -275,6 +341,13 @@ by a connection for which there is `CONNECTION_DATA` record previously present.
  |             |                                                 |
  | `token`     | The watch identifier token, as specified in the |
  |             | `WATCH` operation                               |
+|             |                                                 |
+| `depth`     | The number of directory levels below the        |
+|             | watched path to consider for a match. This      |
+|             | field is aligned to the next 4 octet boundary.  |
+|             | A value of 0xffff is used for unlimited depth.  |
+|             | This field is valid only for version 2 and      |
+|             | higher.                                         |

If we are going to bump the stream version, then I think we should move the field before token/path.

I thought about that, but liked it better to be able to keep a common struct
layout for the record with the V2 fields being at the end.

Main reason is the ability to avoid duplication of code for being able to
handle both versions.


  \pagebreak
@@ -406,6 +479,12 @@ A node permission specifier has the following format:
  Note that perm1 defines the domain owning the node. See [4] for more
  explanation of node permissions.
+\pagebreak
+
+### DOMAIN_DATA
+
+
+

What this section is for?

Oh that is a stale result of a previous variant considering an own record
type for quota and feature information. :-)


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.