[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1] tools/hotplug: systemd: Make dependency on Xen device nodes



On Fri, Aug 25, 2023 at 07:02:33AM +0200, Erik Schilling wrote:
> On Fri Aug 25, 2023 at 5:36 AM CEST, Leo Yan wrote:
> > When system booting up, the kernel module xen_gntdev.ko is loaded and
> > the device node '/dev/xen/gntdev' is created; later the xenstored
> > service in systemd launches daemon to open this device node.
> >
> > This flow has a race condition between creating the device node in the
> > kernel module and using the device node in the systemd service.  It's
> > possible that the xenstored service fails to open the device node due
> > to the delay of creating the device node.  In the end, xenbus cannot be
> > used between the Dom0 kernel and the Xen hypervisor.
> >
> > To resolve this issue, we need to synchronize between udev and systemd
> > for the device node.  There have an extra change in the udev rules for
> > tagging 'systemd' for Xen device nodes, which notifies device node
> > creating to systemd; besides udev change, this patch adds dependency in
> > systemd service for waiting the device node.
> >
> > Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx>
> > ---
> >
> >  The udev rules change is on github:
> >  
> > https://github.com/systemd/systemd/pull/28962/commits/520340dfea3b6cf9fe7a24c9238313b1a5fe8539
> 
> Let's see what they think, but I fear systemd may not be the correct
> upstream to carry this... Skimming through the rules that they have
> there, it mostly looks like they only carry rules that are relevant for
> systemd-related services directly.

Yeah, I share the same concern.  But seems to me, upstreaming change
to the systemd is the most neat fixing.

> >  tools/hotplug/Linux/systemd/xenstored.service.in | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in 
> > b/tools/hotplug/Linux/systemd/xenstored.service.in
> > index 261077dc92..6e48cdb0e7 100644
> > --- a/tools/hotplug/Linux/systemd/xenstored.service.in
> > +++ b/tools/hotplug/Linux/systemd/xenstored.service.in
> > @@ -1,7 +1,7 @@
> >  [Unit]
> >  Description=The Xen xenstore
> > -Requires=proc-xen.mount
> > -After=proc-xen.mount
> > +Requires=proc-xen.mount dev-xen-gntdev.device
> > +After=proc-xen.mount dev-xen-gntdev.device
> 
> I must admit that I am a bit confused why this is enough... During our
> discussion on Slack, when you quoted from your rule it included
> `ENV{SYSTEMD_WANTS}="xenstored.service"`. Did you drop that during later
> development? Was there additional reasearch involved in dropping it?

Yes, I dropped ENV{SYSTEMD_WANTS}="xenstored.service" and it works well
at my side.

My purpose is to upstream the udev rules in systemd as general as
possible.  As you mentioned, the "xenstored.service" is maintained in Xen
but not in systemd, for this reason, I would like avoid to add
"xenstored.service" into udev rules in systemd.

> My understanding was that if devices do not exist when systemd builds
> its transaction model, dependencies on them will just get tossed out[1].
> So I would have expected that there should still be a race if
> the .device does not pull in the service.

Thanks for sharing.  To be honest, I don't know this part.  Doesn't
systemd parse the service script and let service to wait for specific
.device until systemd receives notification for the .device?

> Did you hit the race frequently enough to be sure that this fixes it in
> entirety?

I have two boards.  One on board (Telechips), it's consistently
reproduce this issue, and after applying the udev rules change and
this patch, I confirmed the issue is dismissed entirely.

> Is there a place somewhere in the Xen or kernel code where one
> could add an excessive sleep in order to trigger the race more reliably?

On my AVA board with running Xen, I tried to add a long delay (10
seconds) in the kernel driver 'drivers/xen/gntdev.c', but I can see
systemd will wait for the kernel modules loading, and then it launches
Xen services.  Thus I cannot reproduce this issue on my AVA board.

So in below flow:

- Step 1: drivers/xen/gntdev.c registers misc device;
- Step 2: udev creates device node '/dev/xen/gntdev';
- Step 3: systemd launches xenstored.service.

Seems to me, the race condition happens between step 2 and step 3.  If
there have any delay in udev for creating device node, then the step3
for xenstored.service will fail.

Thanks for review.

Leo

> [1] "citation-needed": But I vaguely remember dealing with a similar
>     issue and getting told this on #systemd IRC
> 
> - Erik



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.