On Wed, 2021-05-19 at 11:09 +0200, Juergen Gross wrote:
On 18.05.21 20:11, Julien Grall wrote:
Hi Juergen,
I have started to look at preserving transaction accross Live-update in
C Xenstored. So far, I managed to transfer transaction that read/write
existing nodes.
Now, I am running into trouble to transfer new/deleted node within a
transaction with the existing migration format.
C Xenstored will keep track of nodes accessed during the transaction but
not the children (AFAICT for performance reason).
Not performance reasons, but because there isn't any need for that:
The children are either unchanged (so the non-transaction node records
apply), or they will be among the tracked nodes (transaction node
records apply). So in both cases all children should be known.
In case a child has been deleted in the transaction, the stream should
contain a node record for that child with the transaction-id and the
number of permissions being zero: see docs/designs/xenstore-migration.md
The problem for oxenstored is that you might've taken a snapshot in the past, your root has moved on, but you have in your snapshot a lot of nodes that have been deleted in the latest root.
A brute force way might be to diff the transaction's state and the latest root state and dump the delta entries as adding/deleting nodes in the migration stream.
This could lead to dumping a lot of duplicate state, and result in an explosion of file size (e.g. if you run 1000 domain, the current max supported limit and each has one tiny transaction from the past
this will lead to 1000x amplification of xenstore size in the dump. In-memory is fine because OCaml will share common tree nodes that are unchanged).
This should correctly restore content but have a bad effect on conflict semantics: your migrated transactions will all then likely conflict at the root, or near the root and fail anyway.
Whereas without a live-update as long as you do not modify any of the old state you would get the conflict marker further down the tree and most of the time able to avoid conflicts.
(ignore the awful indentation that code has been rebased with ignore_all_space so many times between different branches of Xen that whitespace correctness has been lost)
I've got a fuzzer/unit test for live-update (see xen-devel), but it has transactions turned off currently because I couldn't get it to work reliably, it always found examples where the transaction conflict state was not identical pre/post update.
If we abort all transactions after migration as discussed previously then it might be possible to get this to work if we accept the size explosion as a possibility and dump transaction state to /var/tmp, not to /tmp (which might be a tmpfs that gives you
ENOSPC).
Live updates are a fairly niche use case and I'd like to see the current variant without transactions proven to work on an actual XSA (likely the next oxenstored XSA about queue limits if we find a solution to that),
and only after that deploy live-update support with transactions.
We also completely lack any unit tests for transactions (aside from the fuzzer that I started writing, which does just some very minimal state comparisons), we do not have a formal model on how transactions
and transaction conflicts should be handled to check whether transactions behave correctly, though a fairly good appromixation is: run 2 oxenstored one with and without live-update and check that they produce equivalent
(not necessarily identical, txid can change) answers. As long as we do not have to change the transaction semantics or code in any way to support live update.
Best regards,
--Edwin
Juergen
[CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe. |
|