[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 02/10] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area



I'm fine with all suggestions, with one exception that needs a bit more
explanation...

On Tue Oct 29, 2024 at 8:30 PM GMT, Andrew Cooper wrote:
> On 21/10/2024 4:45 pm, Alejandro Vallejo wrote:
> > This allows the initial x2APIC ID to be sent on the migration stream.
> > This allows further changes to topology and APIC ID assignment without
> > breaking existing hosts. Given the vlapic data is zero-extended on
> > restore, fix up migrations from hosts without the field by setting it to
> > the old convention if zero.
> >
> > The hardcoded mapping x2apic_id=2*vcpu_id is kept for the time being,
> > but it's meant to be overriden by toolstack on a later patch with
> > appropriate values.
> >
> > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@xxxxxxxxx>
>
> I'm going to request some changes, but I think they're only comment
> changes. [edit, no sadly, one non-comment change.]
>
> It's unfortunate that Xen uses an instance of hvm_hw_lapic for it's
> internal state, but one swamp at a time.
>
>
> In the subject, there's no such thing as the "initial" x2APIC ID. 
> There's just "the x2APIC ID" and it's not mutable state as far as the
> guest is concerned  (This is different to the xAPIC id, where there is
> an architectural concept of the initial xAPIC ID, from the days when
> OSes were permitted to edit it).  Also, it's x86/hvm, seeing as this is
> an HVM specific change you're making.
>
> Next, while it's true that this allows the value to move in the
> migration stream, the more important point is that this allows the
> toolstack to configure the x2APIC ID for each vCPU.
>
> So, for the commit message, I recommend:
>
> ---%<---
> Today, Xen hard-codes x2APIC_ID = vcpu_id * 2, but this is unwise and
> interferes with providing accurate topology information to the guest.
>
> Introduce a new x2apic_id field into hvm_hw_lapic.  This is immutable
> state from the guest's point of view, but it allows the toolstack to
> configure the value, and for the value to move on migrate.
>
> For backwards compatibility, we treat incoming zeroes as if they were
> the old hardcoded scheme.
> ---%<---
>
> > diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
> > index 2a777436ee27..e2489ff8e346 100644
> > --- a/xen/arch/x86/cpuid.c
> > +++ b/xen/arch/x86/cpuid.c
> > @@ -138,10 +138,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
> >          const struct cpu_user_regs *regs;
> >  
> >      case 0x1:
> > -        /* TODO: Rework topology logic. */
> >          res->b &= 0x00ffffffu;
> >          if ( is_hvm_domain(d) )
> > -            res->b |= (v->vcpu_id * 2) << 24;
> > +            res->b |= vlapic_x2apic_id(vcpu_vlapic(v)) << 24;
>
> There wants to be some kind of note here, especially as you're feeding
> vlapic_x2apic_id() into a field called xAPIC ID.  Perhaps
>
> /* Large systems do wrap around 255 in the xAPIC_ID field. */
>
> ?
>
>
> >  
> >          /* TODO: Rework vPMU control in terms of toolstack choices. */
> >          if ( vpmu_available(v) &&
> > @@ -310,19 +309,16 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
> >          break;
> >  
> >      case 0xb:
> > -        /*
> > -         * In principle, this leaf is Intel-only.  In practice, it is 
> > tightly
> > -         * coupled with x2apic, and we offer an x2apic-capable APIC 
> > emulation
> > -         * to guests on AMD hardware as well.
> > -         *
> > -         * TODO: Rework topology logic.
> > -         */
> >          if ( p->basic.x2apic )
> >          {
> >              *(uint8_t *)&res->c = subleaf;
> >  
> > -            /* Fix the x2APIC identifier. */
> > -            res->d = v->vcpu_id * 2;
> > +            /*
> > +             * Fix the x2APIC identifier. The PV side is nonsensical, but
> > +             * we've always shown it like this so it's kept for compat.
> > +             */
>
> In hindsight I should changed "Fix the x2APIC identifier." when I
> reworked this logic, but oh well - better late than never.
>
> /* The x2APIC_ID is per-vCPU, and fixed irrespective of the requested
> subleaf. */
>
> I'd also put a little more context in the PV side:
>
> /* Xen 4.18 and earlier leaked x2APIC into PV guests.  The value shown
> is nonsensical but kept as-was for compatibility. */
>
> > diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> > index 3363926b487b..33b463925f4e 100644
> > --- a/xen/arch/x86/hvm/vlapic.c
> > +++ b/xen/arch/x86/hvm/vlapic.c
> > @@ -1538,6 +1538,16 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> >      const struct vcpu *v = vlapic_vcpu(vlapic);
> >      uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
> >  
> > +    /*
> > +     * Loading record without hw.x2apic_id in the save stream, calculate 
> > using
> > +     * the traditional "vcpu_id * 2" relation. There's an implicit 
> > assumption
> > +     * that vCPU0 always has x2APIC0, which is true for the old relation, 
> > and
> > +     * still holds under the new x2APIC generation algorithm. While that 
> > case
> > +     * goes through the conditional it's benign because it still maps to 
> > zero.
> > +     */
>
> It's not an implicit assumption; it's very explicit.

It's implicit because it's not mentioned anywhere else and parts of the Xen
ecosystem live under the pretense that such a thing can indeed happen.

>
> /* Xen 4.19 and earlier had no x2APIC_ID in the migration stream, and
> hard-coded "vcpu_id * 2".  Default back to this if we have a
> zero-extended record.  */
>
> But, this will go malfunction if the toolstack tries to set v!0's
> x2APIC_ID to 0.

I assume you mean vcpuN with N != 0. I maintain that allowing non-monotonically
increasing APIC IDs on vCPUs is technical debt disguised as a misfeature. For
one, it would prevent hvmloader from asserting some sanity on its own reads of
APIC IDs, but it would be a mess to debug in general. I started making real
progress on the toolstack after asserting all APs had non-zero APIC IDs.

So, while...

>
> What you need to know is whether lapic_load_hidden() had to zero-extend
> the record or not (more specifically, over this field), so you want
> h->size <= offsetof(x2_apicid) as the gating condition.

... this is true and a more adequate gating condition (that I'm happy to
replace the current one with), I'd still like to keep the invariant that APIC
IDs must be monotonically increasing with the vCPU id, which has the side
effect of banning zero outside the BSP.

>
> This should be safe for the toolstack, I think.  Hypercalls prior to
> this patch will get a shorter record, and hypercalls from this patch
> onwards will get a longer record with the default x2APIC_ID = vcpu_id *
> 2 filled in.
>
> > +    if ( !vlapic->hw.x2apic_id )
> > +        vlapic->hw.x2apic_id = v->vcpu_id * 2;
> > +
> >      /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct 
> > */
> >      if ( !vlapic_x2apic_mode(vlapic) ||
> >           (vlapic->loaded.ldr == good_ldr) )
> > @@ -1606,6 +1616,13 @@ static int cf_check lapic_check_hidden(const struct 
> > domain *d,
> >           APIC_BASE_EXTD )
> >          return -EINVAL;
> >  
> > +    /*
> > +     * Fail migrations from newer versions of Xen where
> > +     * rsvd_zero is interpreted as something else.
> > +     */
>
> This comment isn't necessary.  We've got no shortage of reserved
> checks.  However ...
>
> > diff --git a/xen/include/public/arch-x86/hvm/save.h 
> > b/xen/include/public/arch-x86/hvm/save.h
> > index 7ecacadde165..1c2ec669ffc9 100644
> > --- a/xen/include/public/arch-x86/hvm/save.h
> > +++ b/xen/include/public/arch-x86/hvm/save.h
> > @@ -394,6 +394,8 @@ struct hvm_hw_lapic {
> >      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
> >      uint32_t             timer_divisor;
> >      uint64_t             tdt_msr;
> > +    uint32_t             x2apic_id;
> > +    uint32_t             rsvd_zero;
>
> ... we do normally spell it _rsvd; to make it extra extra clear that
> people shouldn't be doing anything with it.
>
> ~Andrew




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.