[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [xen staging] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
commit beb105af19cc3e58e2ed49224cfe190a736e5fa0 Author: David Woodhouse <dwmw@xxxxxxxxxxxx> AuthorDate: Thu Mar 19 20:40:24 2020 +0000 Commit: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> CommitDate: Fri Aug 14 16:14:21 2020 +0100 tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating The do_ls() function has somewhat inconsistent handling of errors. If reading the node's contents with xs_read() fails, then do_ls() will just quietly not display the contents. If reading the node's permissions with xs_get_permissions() fails, then do_ls() will print a warning, continue, and ultimately won't exit with an error code (unless another error happens). If recursing into the node with xs_directory() fails, then do_ls() will abort immediately, not printing any further nodes. For persistent failure modes â?? such as ENOENT because a node has been removed, or EACCES because it has had its permisions changed since the xs_directory() on the parent directory returned its name â?? it's obviously quite likely that if either of the first two errors occur for a given node, then so will the third and thus xenstore-ls will abort. The ENOENT one is actually a fairly common case, and has caused tools to fail to clean up a network device because it *apparently* already doesn't exist in xenstore. There is a school of thought that says, "Well, xenstore-ls returned an error. So the tools should not trust its output." The natural corollary of this would surely be that the tools must re-run xenstore-ls as many times as is necessary until its manages to exit without hitting the race condition. I am not keen on that conclusion. For the specific case of ENOENT it seems reasonable to declare that, but for the timing, we might as well just not have seen that node at all when calling xs_directory() for the parent. By ignoring the error, we give acceptable output. The issue can be reproduced as follows: (dom0) # for a in `seq 1 1000` ; do xenstore-write /local/domain/2/foo/$a $a ; done Now simultaneously: (dom0) # for a in `seq 1 999` ; do xenstore-rm /local/domain/2/foo/$a ; done (dom2) # while true ; do ./xenstore-ls -p /local/domain/2/foo | grep -c 1000 ; done We should expect to see node 1000 in the output, every time. Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx> Reviewed-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> --- tools/xenstore/xenstore_client.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/tools/xenstore/xenstore_client.c b/tools/xenstore/xenstore_client.c index 3afc630ab8..ae7ed3eb9e 100644 --- a/tools/xenstore/xenstore_client.c +++ b/tools/xenstore/xenstore_client.c @@ -148,14 +148,20 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms int i; unsigned int num, len; + e = xs_directory(h, XBT_NULL, path, &num); + if (e == NULL) { + if (cur_depth && errno == ENOENT) { + /* If a node disappears while recursing, silently move on. */ + return; + } + + err(1, "xs_directory (%s)", path); + } + newpath = malloc(STRING_MAX); if (!newpath) err(1, "malloc in do_ls"); - e = xs_directory(h, XBT_NULL, path, &num); - if (e == NULL) - err(1, "xs_directory (%s)", path); - for (i = 0; i<num; i++) { char buf[MAX_STRLEN(unsigned int)+1]; struct xs_permissions *perms; -- generated by git-patchbot for /home/xen/git/xen.git#staging
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |