[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: EFI's -mapbs option may cause Linux to panic()



On 23.11.22 10:18, Jan Beulich wrote:
On 23.11.2022 08:39, Juergen Gross wrote:
On 22.11.22 10:47, Roger Pau Monné wrote:
On Mon, Nov 21, 2022 at 06:01:00PM +0100, Jan Beulich wrote:
On 21.11.2022 17:48, Roger Pau Monné wrote:
On Mon, Nov 21, 2022 at 05:27:16PM +0100, Jan Beulich wrote:
Hello,

on a system with these first two EFI memory map entries

(XEN)  0000000000000-000000009dfff type=4 attr=000000000000000f
(XEN)  000000009e000-000000009ffff type=2 attr=000000000000000f

i.e. except for 2 pages all space below 1M being BootServicesData, the
-mapbs option has the effect of marking reserved all that space. Then
Linux fails trying to allocate its lowmem trampoline (which really it
shouldn't need when running in PV mode), ultimately leading to

                panic("Real mode trampoline was not allocated");

in their init_real_mode().

While for PV I think it is clear that the easiest is to avoid
trampoline setup in the first place, iirc PVH Dom0 also tries to
mirror the host memory map to its own address space. Does PVH Linux
require a lowmem trampoline?

Yes, it does AFAIK.  I guess those two pages won't be enough for
Linux boot trampoline requirements then.

I assume native Linux is fine with this memory map because it reclaims
the EfiBootServicesData region and that's enough.

That's my understanding as well.

While the two pages here are just enough for Xen's trampoline, I still
wonder whether we want to adjust -mapbs behavior. Since whatever we
might do leaves a risk of conflicting with true firmware (mis)use of
that space, the best I can think of right now would be another option
altering behavior (or providing altered behavior). Yet such an option
would likely need to be more fine-grained then than covering all of
the low Mb in one go. Which feels like both going too far and making
it awkward for people to figure out what value(s) to use ...

Thoughts anyone?

I'm unsure what to recommend.  The mapbs option is a workaround for
broken firmware, and it's not enabled by default, so we might be lucky
and never find a system with a memory map like you describe that also
requires mapbs in order to boot.

Guess how we've learned of the issue: Systems may boot fine without
-mapbs, but they may fail to reboot because of that (in)famous issue of
firmware writers not properly separating boot services code paths from
runtime services ones. And there we're dealing with a system where I
suspect this to be the case, just that - unlike in earlier similar
cases - there's no "clean" crash proving the issue (the system simply
hangs). Hence my request that they use -mapbs to try to figure out.

And yes, "reboot=acpi" helps there, but they insist on knowing what
component is to blame.

Well, if reboot=acpi fixes it then it's quite clear EFI reboot method
is to blame?

Or they want to know the exact cause that makes EFI reboot fail,
because that's quite difficult to figure out from our end.

But I'm afraid I don't see any solution to make mapbs work with a PVH
dom0 on a system with a memory map like you provided, short of adding
some kind of bodge to not map and mark as reserved memory below 1MB
(but that kind of defeats the purpose of mapbs).

What we could do in such a case would be to inhibit suspending the
system, and to run dom0 with a single cpu only. An error message
indicating that the system should be booted without mapbs should be
issued, of course.

That's going to be awkward: Linux can't very well issue a message
suggesting to remove the use of a hypervisor option (behavior of which
is an implementation detail to some degree, and hence the message
could end up being misleading later). Xen also can't very well issue
such a message, since it doesn't know how much of lowmem is going to
be enough for whichever Dom0 OS there's going to be booted. In
principle an OS may get away with less than a single page. Hence Xen
at best could issue a "may not work" message (unless no space at all
was available at some 4k-aligned boundary), and even then it being a
false indication on some (many?) systems may lead to people not paying
attention when they should.

A kernel message could be phrased more generic, e.g. "Couldn't find a
large enough free memory region below 1MB, maybe due to hypervisor
settings and/or firmware issues."


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.