[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/arm: acpi: Support memory reserve configuration table



On Tue, 06 Sep 2022 08:17:14 +0100,
Leo Yan <leo.yan@xxxxxxxxxx> wrote:
> 
> Hi Marc,
> 
> On Tue, Sep 06, 2022 at 07:27:17AM +0100, Marc Zyngier wrote:
> > On Tue, 06 Sep 2022 03:52:37 +0100,
> > Leo Yan <leo.yan@xxxxxxxxxx> wrote:
> > > 
> > > On Thu, Aug 25, 2022 at 10:40:41PM +0800, Leo Yan wrote:
> > > 
> > > [...]
> > > 
> > > > > > But here I still cannot create the concept that how GIC RD tables 
> > > > > > play
> > > > > > roles to support the para virtualization or passthrough mode.
> > > > >
> > > > > I am not sure what you are actually asking. The pending tables are 
> > > > > just
> > > > > memory you give to the GICv3 to record the state of the interrupts.
> > > >
> > > > For more specific, Xen has its own RD pending table, and we can use
> > > > this pending table to set state for SGI/PPI/LPI for a specific CPU
> > > > interface.  Xen works as hypervisor, it saves and restores the pending
> > > > table according to switched in VM context, right?
> > > >
> > > > On the other hand, what's the purpose for Linux kernel's GIC RD
> > > > pending table?  Is it only used for nested virtulisation?  I mean if
> > > > Linux kernel's GIC RD pending table is not used for the drivers in
> > > > Dom0 or DomU, then it's useless to pass it from the primary kernel to
> > > > secondary kernel; as result, we don't need to reserve the persistent
> > > > memory for the pending table in this case.
> > > 
> > > I don't receive further confirmation from Marc, anyway, I tried to cook
> > > a kernel patch to mute the kernel oops [1].
> > 
> > What sort of confirmation do you expect from me? None of what you
> > write above make much sense in the face of the architecture.
> 
> Okay, I think have two questions for you:
> 
> - The first question is if we really need to reserve persistent memory
>   for RD pending table and configuration table when Linux kernel runs
>   in Xen domain?

I have no idea, and really I don't want to know. The architecture
doesn't make it safe to reuse that memory, and the driver does the
right thing by always reserving that memory when the FW is supposed to
support it.

The "oh but it is safe on so and so" approach doesn't scale. If you
want to have such a thing, just convince people at ARM that it is
possible to implement a GICv3-compliant system without the RD tables,
get them to update the architecture to allow this scheme and advertise
it in a discoverable register. Xen could then implement it, Linux
could check this bit, and we'd all be a happy family.

Because that's really what this is: it isn't that you don't care about
the RD tables being reserved. It is that you don't care about them at
all because they are never used by Xen as the GIC implementation. Your
approach of "huh, let's not reserve it" just papers over this.

> 
> - If the first question's answer is no, so it's not necessary to reserve
>   RD pending table and configuration table for Xen, then what's the good
>   way to dismiss the kernel oops?

A warning, not an oops.

> 
> IIUC, you consider the general flow from architecture view, so you prefer
> to ask Xen to implement EFI stub to comply the general flow for EFI
> booting sequence, right?

If you want to use ACPI, you use EFI, and not a vague emulation of
it. If you use DT, you can reserve the memory upfront. The various
alternatives are in this thread.

> 
> If the conclusion is to change Xen for support EFI stub, then this
> would be fine for me and I will hold on and leave Xen developers to work
> on it.
> 
> > > [1] 
> > > https://lore.kernel.org/lkml/20220906024040.503764-1-leo.yan@xxxxxxxxxx/T/#u
> > 
> > I'm totally baffled by the fact you're trying to add some extra hacks
> > to Linux just to paper over some of the Xen's own issues.
> 
> I have a last question for why kernel reserves RD pending table and
> configuration table for kexec.  As we know, the primary kernel and
> the secondary kernel use separate memory regions,

No, you got it wrong. Only with *kdump* do you get separate memory
regions. kexec reuses all of the memory visible by the primary kernel.

> this means there have
> no race condition that secondary kernel modifies the tables whilist the
> GIC accesses the table if the secondary kernel allocates new pages for
> RD tables.  So only one potential issue I can image is the secondary
> kernel sets new RD pending table and configuration table, which might
> introduce inconsistent issue with rest RDs in the system.
> 
> Could you confirm if my understanding is correct or not?

It isn't correct.

- There is no race condition. Once the RD tables are configured, they
  cannot be changed.

- When the kdump kernel boots, none of the primary OS memory is
  reused, so it is safe to continue and use the same tables in place

- When the kexec kernel boots, all of the memory except for the
  reserved memory is reused. If your RD tables are used for anything,
  you'll see memory corruption as the GIC writes pending bits in the
  pending table, and you'll be unable to configure interrupts
  correctly.

In conclusion, using kexec with GICv3 is completely unsafe if you
don't reserve the memory allocated to the RDs.

> Sorry for noise and many questions.  I understand this is a complex
> and difficult topic for me, and it's very likely that I am absent
> sufficient knowledge for this part, this is just what I want to
> learn from the discussion and from you :-)

I suggest you read the architecture spec, which has all the details.

        M.

-- 
Without deviation from the norm, progress is not possible.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.