[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xen 4.14.0 fails on Dell IoT Gateway without efi=no-rs



On Fri, Aug 21, 2020 at 1:23 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 21.08.2020 09:38, Roman Shaposhnik wrote:
> > On Thu, Aug 20, 2020 at 11:47 PM Jan Beulich <jbeulich@xxxxxxxx> wrote:
> >> On 20.08.2020 21:31, Roman Shaposhnik wrote:
> >>> Well, default is overloaded. What I would like to see (and consider it
> >>> a void of a small downstream/distro) is a community-agreed and
> >>> maintained way of working around these issues. Yes, I'd love to see
> >>> it working by default -- but if we can at least agree on an officially
> >>> supported knob that is less of a hammer than efi=attr=uc -- that'd
> >>> be a good first step.
> >>>
> >>> Makes sense?
> >>
> >> Sure, just that I don't see what less heavyweight alternatives
> >> to "efi=attr=uc" there are (short of supplying an option to
> >> provide per-range memory attributes, which would end up ugly
> >> to use). For the specific case here, "efi=attr=wp" could be
> >> made work, but might not be correct for all of the range (it's
> >> a EfiMemoryMappedIO range, after all); in the majority of cases
> >> of lacking attribute information that I've seen, UC was indeed
> >> what was needed.
> >
> > I think we're talking slightly past each other here -- you seem to be
> > more after trying to figure out how to make this box look like a dozen
> > killobucks worth a server, I'm after trying to figure out what callsites
> > in Xen tickle that region.
>
> What I'm trying is to understand what exactly is wrong in the firmware,
> as that'll likely allow determining a minimal workaround.

Fair enough. So let me start with a major update. After a bit of trial and
error it became apparent that a combination of efi=attr=uc AND
removing the call to efi_get_time as per:
    https://lists.archive.carbon60.com/xen/devel/408709
allows Xen to boot just fine and function properly on that device.

I'm not sure if that answers your question around what's wrong with
this firmware, but perhaps it suggests that the point from that old
thread above still maybe valid: perhaps avoiding GetTime() altogether
may help a lot of downstream users (especially those running on
more consumer-like h/w -- since this issue seems to come up in
QubesOS context as well).

Btw, just out of curiosity -- I poked around GetTime() disassembly
and while it is pretty convoluted my hunch is that it is indeed broken
for some internal reasons, not something as simple as page mapping.

So I guess a short version of answering your question would be:
GetTime() seems to be broken on this firmware.

> Figuring out
> the call sites is certainly also an approach, but the stack trace
> provided isn't enough for doing so, I'm afraid. Even the raw hex stack
> dump contains only two pointers into Xen's .text, and to figure what
> they represent one would need the xen.efi that was used. Possibly even
> a deeper stack dump might be needed.

Agreed. I was mostly using it to poke around possible reasons for it
failing.

> > I appreciate and respect your position, but please hear mine as well:
> > yes we're clearly into the "workaround" territory here, but clearly
> > Linux is fully capable of these workaround and I would like to understand
> > how expensive it will be to teach Xen those tricks as well.
>
> My prime example here is their blanket avoiding of the time related
> runtime services, despite the EFI spec saying the exact opposite.

Well, to be fair, it seems that the practical experience with various
bits of hardware suggests that in this particular case avoidance
may be the lesser of all the evils.

Or to ask a complimentary question: what's the danger of making that
patch (in a cleaned up form) the default behaviour? Will there be any
instances of hardware where it may actually hurt?

> "efi=no-rs" is just a wider scope workaround of this same kind.

The problem with "efi=no-rs" is that it is actually unbounded.

IOW, compare two cases:
   1. disable a single call to GetTime()
   2. disable all calls to EFI RS?
Case #1 I can reason about -- case #2 -- not so much (unless somebody
explains to me the full scope of what gets disabled when efi=no-rs).

Now, you may say (and seems like you do ;-)) that if a small part of
the implementation can't be trusted -- the entire thing shouldn't be
trusted -- I don't think I will buy into that policy -- but it is a policy.

> The reasoning I see behind this is that if the time related runtime
> services are problematic, how likely is it that others are fine to
> use? And how would an admin know without first having run into some
> crash? If there are fair reasons to have finer grained disabling of
> runtime services - why not? But it'll still take a command line
> option to do so, unless (as was proposed) a build-time option of
> enabling all (common?) workarounds by default gets made use of.

Well, policy (and trust issues) aside -- I think the real question
is -- it seems that there's quite a bit of downstream that agrees
that avoiding GetTime() is a good idea. What options do we have
to make that possible without each downstream carrying a custom
patch (which I'm adding to EVE as we speak)?

> > Now, whether you'd accept these tricks upstream or not is an entirely
> > orthogonal question.
>
> Well, I'd say "separate", not "orthogonal", because the nature of
> such workarounds qualifies (to me) what is or is not acceptable as
> default behavior.

Good point.

Thanks,
Roman.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.