[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: request for feedback on a Xen/Linux compatibility issue



Hi,

On 07/01/2022 00:02, Stefano Stabellini wrote:
On Thu, 6 Jan 2022, Julien Grall wrote:
On 06/01/2022 14:03, Jan Beulich wrote:
On 06.01.2022 08:13, Juergen Gross wrote:
On 06.01.22 01:40, Stefano Stabellini wrote:
Hi all,

Today Xen dom0less guests are not "Xen aware": the hypervisor node
(compatible = "xen,xen") is missing from dom0less domUs device trees and
as a consequence Linux initializes as if Xen is not present. The reason
is that interfaces like grant table and xenstore (xenbus in Linux) don't
work correctly in a dom0less environment at the moment.

The good news is that I have patches for Xen to implement PV drivers
support for dom0less guests. They also add the hypervisor node to device
tree for dom0less guests so that Linux can discover the presence of Xen
and related interfaces.

When the Linux kernel is booting as dom0less kernel, it needs to delay
the xenbus initialization until the interface becomes ready. Attempts to
initialize xenbus straight away lead to failure, which is fine because
xenbus has never worked in Linux when running as dom0less guest up until
now. It is reasonable that a user needs a newer Linux to take advantage
of dom0less with PV drivers. So:

- old Xen + old/new Linux -> Xen not detected in Linux
- new Xen + old Linux     -> xenbus fails to initialize in Linux
- new Xen + new Linux     -> dom0less PV drivers working in Linux


The problem is that Linux until recently couldn't deal with any errors
in xenbus initialization. Instead of returning error and continuing
without xenbus, Linux would crash at boot.

I upstreamed two patches for Linux xenbus_probe to be able to deal with
initialization errors. With those two fixes, Linux can boot as a
dom0less kernel with the hypervisor node in device tree. The two fixes
got applied to master and were already backported to all the supported
Linux stable trees, so as of today:

- dom0less with hypervisor node + Linux 5.16+           -> works
- dom0less with hypervisor node + stable Linux 5.10     -> works
- dom0less with hypervisor node + unpatched Linux 5.10  -> crashes


Is this good enough? Or for Xen/Linux compatibility we want to also be
able to boot vanilla unpatched Linux 5.10 as dom0less kernel? If so,
the simplest solution is to change compatible string for the hypervisor
node, so that old Linux wouldn't recognize Xen presence and wouldn't try
to initialize xenbus (so it wouldn't crash on failure). New Linux can of
course learn to recognize both the old and the new compatible strings.
(For instance it could be compatible = "xen,xen-v2".) I have prototyped
and tested this solution successfully but I am not convinced it is the
right way to go.

Do you have any suggestion or feedback?

The Linux crash on xenbus initialization failure is a Linux bug, not a
Xen issue. For this reason, I am tempted to say that we shouldn't change
compatible string to work-around a Linux bug, especially given that the
Linux stable trees are already all fixed.

What about adding an option to your Xen patches to omit the hypervisor
node in the device tree? This would enable the user to have a mode
compatible to today's behavior.

While this sounds nice at the first glance, this would need to be a per-
domain setting. Which wouldn't be straightforward to express via command
line option (don't know how feasible it would be to express such via other
means).

For dom0less, domains are described in the Device-Tree. We have one node per
domain, so we could add a property to indicate whether the domain should be
started in compat mode (or not).

That said, I am not sure every users will want Linux to use
grant-table/xenstore (possibly, some users may want one but not the other).

So how about a more generic property "xen,enhanced" with an opional value
indicating whether this is disabled, enabled or the list of interface (e.g.
xenbus, grant-table) exposed?

Yeah, I like this idea. It would allow for maximum flexibility while not
requiring any changes to the existing Xen/Linux interface; even the
compatible string would remain unmodified.

I also find the ability to select individual features interesting,
although I don't have a concrete use-case for it yet. I should say that
I do have a concrete use-case for enabling only event-channels but they
are actually already enabled for dom0less guests because they are just
hypercalls.

I thought about mentioning the hypercalls yesterday. However, I think it would be better to use XSM as it would be a lot more flexible than a Device-Tree based approach for the hypercalls.

(Nothing disables them at present for dom0less guests so
they get them "by default".)

Let's say we go down this path, which seems nice. The remaining question
is what do we want as default when the new "xen,enhanced" option is
missing. I think it makes sense for the default to be "enabled" because
I expect most people to want the enhacements and they are generally
harmless if you don't use them (except for old unpatched Linux kernels,
which is the main reason why we need the option).

You are probably right about the future use. However, we also need to make sure that the upgrade from a Xen without this feature is painless (I view dom0less bindings as stable). Therefore, I think this should be disabled by default so there are no surprise.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.