Xen project Mailing List

Re: xenstore - Suggestion of batching watch events

To: Andriy Sultanov <sultanovandriy@xxxxxxxxx>

From: Edwin Torok <edwin.torok@xxxxxxxxx>

Date: Tue, 24 Jun 2025 16:23:29 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jürgen Groß <jgross@xxxxxxxx>, ngoc-tu.dinh@xxxxxxxxxx, Christian Lindig <christian.lindig@xxxxxxxxxx>

Delivery-date: Tue, 24 Jun 2025 15:23:46 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Jun 24, 2025 at 4:01 PM Andriy Sultanov <sultanovandriy@xxxxxxxxx> wrote: > > On 6/24/25 3:51 PM, Andriy Sultanov wrote: > > > Suggestion: > > WATCH_EVENT's req_id and tx_id are currently 0. Could it be possible, for > > example, to modify this such that watch events coming as a result of a > > transaction commit (a "batch") have tx_id of the corresponding > > transaction > > and req_id of, say, 2 if it's the last such watch event of a batch and 1 > > otherwise? Old clients would still ignore these values, but it would > > allow > > some others to detect if an update is part of a logical batch that > > doesn't end > > until its last event. > > Come to think of it, since clients could watch arbitrary parts of > what the transaction touched, this wouldn't be as simple, xenstored > would have to issue the "batch ended" packet per token, tracking > that somehow internally... Perhaps transaction_start and transaction_end > could produce WATCH_EVENT (or some other similar packet) as well so > that this tracking could be done client-side? (standard WATCH_EVENT > would still need their tx_id to be modified) > The Windows PV drivers also write a huge amount of data to xenstore. E.g. I see 74 entries under /data, grouped by 1st level: /data/volumes: 39 /dta/scsi: 7 /data/vif: 3 /data/xd: 2 /data/cpus: 2 /data/...: 19 This is probably why the /data/updated optimization used to be beneficial. /attr/os/hotfixes is also large (28 on this VM, but used to be in the hundreds for older versions of Windows). In some situations xenopsd could set up more granular watches, but then you run into scalability issues due to having thousands of watches per domain (although you've fixed the largest problem there: connecting/disconnecting xenstore clients causing slowdowns due to watch trie walks). There is also the problem that watch events don't contain enough information, so watch events only acts as signals to xenopsd, which then goes on and fetches the entire xenstore subtree to figure out what actually changed. Which is the result of some of the O(N^2) performance issues we still have. We used to have a prototype xenstore cache which avoided actually making those fetches from oxenstored, and once something got into its cache, it kept track of updates by setting up a watch on the key. Although a cold start then took ~30m (worse than not having a cache at all). Although a compromise could be to cache on-the-fly (instead of precaching everything you see), e.g. I don't think we actually care about the values under /data/volumes and attr/os/hotfixes, other than for debugging purposes, so if xenopsd never fetches them, the cache shouldn't either. To avoid a lot of round-trips a new kind of watch event that tells you the value(s) in addition to the keys might be useful. And then this new kind of watch event could also be emitted once per transaction (I think the events are already emitted at transaction commit time, and not sooner). If filtering watch events based on tree depths would be useful in some situations then the new watch event could also try to do that. But then one such "batched" watch event could become too big, larger than what would fit into the xenstore ring (and for historic reasons we don't support sending >4k packets through xenstore). Best regards, --Edwin

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.