[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v1] tools/hotplug: systemd: Make dependency on Xen device nodes
On Fri Aug 25, 2023 at 8:36 AM CEST, Leo Yan wrote: > On Fri, Aug 25, 2023 at 07:02:33AM +0200, Erik Schilling wrote: > > On Fri Aug 25, 2023 at 5:36 AM CEST, Leo Yan wrote: > > > When system booting up, the kernel module xen_gntdev.ko is loaded and > > > the device node '/dev/xen/gntdev' is created; later the xenstored > > > service in systemd launches daemon to open this device node. > > > > > > This flow has a race condition between creating the device node in the > > > kernel module and using the device node in the systemd service. It's > > > possible that the xenstored service fails to open the device node due > > > to the delay of creating the device node. In the end, xenbus cannot be > > > used between the Dom0 kernel and the Xen hypervisor. > > > > > > To resolve this issue, we need to synchronize between udev and systemd > > > for the device node. There have an extra change in the udev rules for > > > tagging 'systemd' for Xen device nodes, which notifies device node > > > creating to systemd; besides udev change, this patch adds dependency in > > > systemd service for waiting the device node. > > > > > > Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx> > > > --- > > > > > > The udev rules change is on github: > > > > > > https://github.com/systemd/systemd/pull/28962/commits/520340dfea3b6cf9fe7a24c9238313b1a5fe8539 > > > > Let's see what they think, but I fear systemd may not be the correct > > upstream to carry this... Skimming through the rules that they have > > there, it mostly looks like they only carry rules that are relevant for > > systemd-related services directly. > > Yeah, I share the same concern. But seems to me, upstreaming change > to the systemd is the most neat fixing. > > > > tools/hotplug/Linux/systemd/xenstored.service.in | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in > > > b/tools/hotplug/Linux/systemd/xenstored.service.in > > > index 261077dc92..6e48cdb0e7 100644 > > > --- a/tools/hotplug/Linux/systemd/xenstored.service.in > > > +++ b/tools/hotplug/Linux/systemd/xenstored.service.in > > > @@ -1,7 +1,7 @@ > > > [Unit] > > > Description=The Xen xenstore > > > -Requires=proc-xen.mount > > > -After=proc-xen.mount > > > +Requires=proc-xen.mount dev-xen-gntdev.device > > > +After=proc-xen.mount dev-xen-gntdev.device > > > > I must admit that I am a bit confused why this is enough... During our > > discussion on Slack, when you quoted from your rule it included > > `ENV{SYSTEMD_WANTS}="xenstored.service"`. Did you drop that during later > > development? Was there additional reasearch involved in dropping it? > > Yes, I dropped ENV{SYSTEMD_WANTS}="xenstored.service" and it works well > at my side. Hm. Interesting. Could you plot the activations after boot? systemd-analyze plot > /tmp/plot.svg Seeing failure vs success but also the success cases on AVA vs Telechips may be interesting. I just did some tests on my workstation where I added dependencies on a random USB device. If that one was not plugged at all, the service still happily started. Yet, it obviously seems to do something in your case... So I fear we may not fully understand the real problem yet. I must admit that I find this case a bit under-documented. While "Wants" explicitly says that failling transactions are ignored, there is no clear statement about what happens in that case with "Requires". While skimming the docs I also realized that maybe BindsTo= would maybe be more suitable compared to Requires= here. But that is unrelated to the confusion that I have about the original problem. > My purpose is to upstream the udev rules in systemd as general as > possible. As you mentioned, the "xenstored.service" is maintained in Xen > but not in systemd, for this reason, I would like avoid to add > "xenstored.service" into udev rules in systemd. > > > My understanding was that if devices do not exist when systemd builds > > its transaction model, dependencies on them will just get tossed out[1]. > > So I would have expected that there should still be a race if > > the .device does not pull in the service. > > Thanks for sharing. To be honest, I don't know this part. Doesn't > systemd parse the service script and let service to wait for specific > .device until systemd receives notification for the .device? > > > Did you hit the race frequently enough to be sure that this fixes it in > > entirety? > > I have two boards. One on board (Telechips), it's consistently > reproduce this issue, and after applying the udev rules change and > this patch, I confirmed the issue is dismissed entirely. > > > Is there a place somewhere in the Xen or kernel code where one > > could add an excessive sleep in order to trigger the race more reliably? > > On my AVA board with running Xen, I tried to add a long delay (10 > seconds) in the kernel driver 'drivers/xen/gntdev.c', but I can see > systemd will wait for the kernel modules loading, and then it launches > Xen services. Thus I cannot reproduce this issue on my AVA board. Seeing the systemd-analyze plots would maybe help to understand the differences here... Does one or the other maybe involve some initrd stage that already creates devices? Also: What was the original error that you saw? Was /dev/xen/gntdev missing entirely? Or did interaction with it fail with some error code? Can you reproduce the race while booting with systemd.log_level=debug and export the full journal of the current boot? journalctl -b0 -o export > journal.txt > So in below flow: > > - Step 1: drivers/xen/gntdev.c registers misc device; > - Step 2: udev creates device node '/dev/xen/gntdev'; > - Step 3: systemd launches xenstored.service. > > Seems to me, the race condition happens between step 2 and step 3. If > there have any delay in udev for creating device node, then the step3 > for xenstored.service will fail. Hm... If you see systemd waiting for the module, then Step 2 + 3 should not be able to race. The service does not set DefaultDependencies=no, so xenstored.service should only run after sysinit.target is reached. Since that target depends on the module loading, everything should - in theory - work out. - Erik > > Thanks for review. > > Leo > > > [1] "citation-needed": But I vaguely remember dealing with a similar > > issue and getting told this on #systemd IRC > > > > - Erik
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |