[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 05/10] x86/HVM/SVM: Add AVIC initialization code

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxx>
From: Suravee Suthikulpanit <Suravee.Suthikulpanit@xxxxxxx>
Date: Thu, 5 Jan 2017 00:24:27 +0700
Cc: boris.ostrovsky@xxxxxxxxxx, sherry.hurwitz@xxxxxxx, jbeulich@xxxxxxxx
Delivery-date: Wed, 04 Jan 2017 17:24:56 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

Hi Andrew

On 1/2/17 23:37, Andrew Cooper wrote:

On 31/12/2016 05:45, Suravee Suthikulpanit wrote:

[...]
+/*
+ * Note: Current max index allowed for physical APIC ID table is 255.
+ */
+#define AVIC_PHY_APIC_ID_MAX    0xFF
+
+#define AVIC_DOORBELL           0xc001011b
+
+#define AVIC_HPA_SHIFT  12
+#define AVIC_HPA_MASK           (((1ULL << 40) - 1) << AVIC_HPA_SHIFT)


What does this mask represent?  I have spotted a truncation bug below,
but how to fix the issue depends on the meaning of the mask.

The only limits I can see in the spec is that the host physical
addresses must be present in system RAM, which presumably means they
should follow maxphysaddr like everything else?


Please see below.

[...]
+/*
+ * Note:
+ * Currently, svm-avic mode is not supported with nested virtualization.
+ * Therefore, it is not yet currently enabled by default. Once the support
+ * is in-place, this should be enabled by default.
+ */
+bool svm_avic = 0;
+boolean_param("svm-avic", svm_avic);


We are trying to avoid the proliferation of top-level options.  Please
could you introduce a "svm=" option which takes avic as a sub-boolean,
similar to how iommu= currently works?


Sure, I will introduce the custom_param("svm", parse_svm_param).

[...]

>> +{

+    struct avic_phy_apic_id_ent *avic_phy_apic_id_table;
+    struct svm_domain *d = &v->domain->arch.hvm_domain.svm;
+
+    if ( !d->avic_phy_apic_id_table_mfn )
+        return NULL;
+
+    /*
+    * Note: APIC ID = 0xff is used for broadcast.
+    *       APIC ID > 0xff is reserved.
+    */
+    if ( index >= 0xff )
+        return NULL;
+
+    avic_phy_apic_id_table = mfn_to_virt(d->avic_phy_apic_id_table_mfn);


This is unsafe and will break on hosts with more than 5TB of RAM.

As you allocate avic_phy_apic_id_table_mfn with alloc_domheap_page(),
you must use {map,unmap}_domain_page() to get temporary mappings (which
will use mfn_to_virt() in the common case), or use
{map,unmap}_domain_page_global() to get permanent mappings.

As the values in here need modifying across a schedule, I would suggest
making a global mapping at create time, and storing the result in a
properly-typed pointer.

Thanks for pointing this out. I'll update both logical and physical APICID table to use the {map,unmap}_domain_page_global() instead.

[...]
+
+    if ( !svm_avic )
+        return 0;


|| !has_vlapic(d)

HVMLite domains may legitimately be configured without an LAPIC at all.

Hm... That means I would also need to check this insvm_avic_dom_destroy() and svm_avic_init_vmcb().

[...]
+
+static inline u32 *
+avic_get_bk_page_entry(const struct vcpu *v, u32 offset)
+{
+    const struct vlapic *vlapic = vcpu_vlapic(v);
+    char *tmp;
+
+    if ( !vlapic || !vlapic->regs_page )
+        return NULL;
+
+    tmp = (char *)page_to_virt(vlapic->regs_page);
+    return (u32 *)(tmp + offset);


This is also unsafe over 5TB.  vlapic->regs is already a global mapping
which you should use.


Good point.


Also, you should leave a comment by the allocation of regs_page that
AVIC depends on it being a full page allocation.


Ok, I will add a comment in arch/x86/hvm/vlapic.c: vlapic_init().

+}
+
+void svm_avic_update_vapic_bar(const struct vcpu *v, uint64_t data)
+{
+    const struct arch_svm_struct *s = &v->arch.hvm_svm;
+
+    s->vmcb->avic_vapic_bar = data & AVIC_VAPIC_BAR_MASK;
+    s->vmcb->cleanbits.fields.avic = 0;


While I can appreciate what you are trying to do here, altering the
physical address in IA32_APICBASE has never functioned properly under
Xen, as nothing updates the P2M.  Worse, with multi-vcpu guests, the use
of a single shared p2m makes this impossible to do properly.


Ahh.. Good to know

I know it is non-architectural behaviour, but IMO vlapic_msr_set()
should be updated to raise #GP[0] when attempting to change the frame,
to make it fail obviously rather than having interrupts go missing.

Also I see that the Intel side (via vmx_vlapic_msr_changed()) makes the
same mistake.

Okay, it seems that the hvm_msr_write_intercept() has already handledthis isn't it? So, I should be able to pretty much removing the handlingfor the MSR here.

+}
+
+int svm_avic_init_vmcb(struct vcpu *v)
+{
+    paddr_t ma;
+    u32 *apic_id_reg;
+    struct arch_svm_struct *s = &v->arch.hvm_svm;
+    struct vmcb_struct *vmcb = s->vmcb;
+    struct svm_domain *d = &v->domain->arch.hvm_domain.svm;
+    const struct vlapic *vlapic = vcpu_vlapic(v);
+    struct avic_phy_apic_id_ent entry;
+
+    if ( !svm_avic )
+        return 0;
+
+    if ( !vlapic || !vlapic->regs_page )
+        return -EINVAL;


Somewhere you need a bounds check against the APIC ID being within AVIC
limits.


We are checking the bound in the avic_get_phy_apic_id_ent().

+    if ( !s->avic_last_phy_id )
+        return -EINVAL;


Why does a pointer into the physical ID page need storing in
arch_svm_struct?

This was so that we can quickly access the physical APIC ID entry toupdate them w/o having to call avic_get_phy_apic_id_entry().

+
+    vmcb->avic_bk_pg_pa = page_to_maddr(vlapic->regs_page) & AVIC_HPA_MASK;


This use of AVIC_HPA_MASK may truncate the the address, which is
definitely a problem.

I'm not quite sure that I got the truncation that you pointed out. Couldyou please elaborate?

If AVIC_HPA_MASK isn't just related to maxphysaddr, then you need to
pass appropriate memflags into the alloc_domheap_page() calls to get a
suitable page.

If by "maxphysaddr" is the 52-bit physical address limit for the PAEmode, then I think that's related.

+
+    entry = *(s->avic_last_phy_id);
+    smp_rmb();
+    entry.bk_pg_ptr = (vmcb->avic_bk_pg_pa & AVIC_HPA_MASK) >> AVIC_HPA_SHIFT;
+    entry.is_running = 0;
+    entry.valid = 1;
+    *(s->avic_last_phy_id) = entry;
+    smp_wmb();


During domain creation, no guests are running, so you can edit this cleanly.

What values are actually being excluded here?  This, and other patches,
look like you are actually just trying to do a read_atomic(), modify,
write_atomic() update, rather than actually requiring ordering.

Besides the read_atomic(), modify write_atomic() to update the entry. Ialso want to make sure that the compiler won't shuffle the order around,which I thought can be achieved via smp_rmb() and smp_wmb().

[...]
@@ -1799,6 +1802,10 @@ static int svm_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)

     switch ( msr )
     {
+    case MSR_IA32_APICBASE:
+        if ( svm_avic_vcpu_enabled(v) )
+            svm_avic_update_vapic_bar(v, msr_content);
+        break;


I think this is dead code.  MSR_IA32_APICBASE is handled in
hvm_msr_write_intercept(), and won't fall into the vendor hook.  If you
were to continue down this route, I would add another pi_ops hook, and
replace the VMX layering violation in vlapic_msr_set().


I see now. I will remove the code added here.


Thanks,
Suravee

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v2 05/10] x86/HVM/SVM: Add AVIC initialization code
  - From: Andrew Cooper

References:
- Re: [Xen-devel] [PATCH v2 05/10] x86/HVM/SVM: Add AVIC initialization code
  - From: Andrew Cooper

Prev by Date: Re: [Xen-devel] [PATCH 10/27] x86/cpuid: Introduce named feature bitmaps
Next by Date: Re: [Xen-devel] [PATCH 07/27] x86/cpuid: Recalculate a domains CPUID policy when appropriate
Previous by thread: Re: [Xen-devel] [PATCH v2 05/10] x86/HVM/SVM: Add AVIC initialization code
Next by thread: Re: [Xen-devel] [PATCH v2 05/10] x86/HVM/SVM: Add AVIC initialization code
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.