[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 05/10] x86/HVM/SVM: Add AVIC initialization code



Hi Andrew

On 1/2/17 23:37, Andrew Cooper wrote:
On 31/12/2016 05:45, Suravee Suthikulpanit wrote:
[...]
+/*
+ * Note: Current max index allowed for physical APIC ID table is 255.
+ */
+#define AVIC_PHY_APIC_ID_MAX    0xFF
+
+#define AVIC_DOORBELL           0xc001011b
+
+#define AVIC_HPA_SHIFT  12
+#define AVIC_HPA_MASK           (((1ULL << 40) - 1) << AVIC_HPA_SHIFT)

What does this mask represent?  I have spotted a truncation bug below,
but how to fix the issue depends on the meaning of the mask.

The only limits I can see in the spec is that the host physical
addresses must be present in system RAM, which presumably means they
should follow maxphysaddr like everything else?

Please see below.

[...]
+/*
+ * Note:
+ * Currently, svm-avic mode is not supported with nested virtualization.
+ * Therefore, it is not yet currently enabled by default. Once the support
+ * is in-place, this should be enabled by default.
+ */
+bool svm_avic = 0;
+boolean_param("svm-avic", svm_avic);

We are trying to avoid the proliferation of top-level options.  Please
could you introduce a "svm=" option which takes avic as a sub-boolean,
similar to how iommu= currently works?

Sure, I will introduce the custom_param("svm", parse_svm_param).

[...]
>> +{
+    struct avic_phy_apic_id_ent *avic_phy_apic_id_table;
+    struct svm_domain *d = &v->domain->arch.hvm_domain.svm;
+
+    if ( !d->avic_phy_apic_id_table_mfn )
+        return NULL;
+
+    /*
+    * Note: APIC ID = 0xff is used for broadcast.
+    *       APIC ID > 0xff is reserved.
+    */
+    if ( index >= 0xff )
+        return NULL;
+
+    avic_phy_apic_id_table = mfn_to_virt(d->avic_phy_apic_id_table_mfn);

This is unsafe and will break on hosts with more than 5TB of RAM.

As you allocate avic_phy_apic_id_table_mfn with alloc_domheap_page(),
you must use {map,unmap}_domain_page() to get temporary mappings (which
will use mfn_to_virt() in the common case), or use
{map,unmap}_domain_page_global() to get permanent mappings.

As the values in here need modifying across a schedule, I would suggest
making a global mapping at create time, and storing the result in a
properly-typed pointer.

Thanks for pointing this out. I'll update both logical and physical APIC ID table to use the {map,unmap}_domain_page_global() instead.


[...]
+
+    if ( !svm_avic )
+        return 0;

|| !has_vlapic(d)

HVMLite domains may legitimately be configured without an LAPIC at all.

Hm... That means I would also need to check this in svm_avic_dom_destroy() and svm_avic_init_vmcb().

[...]
+
+static inline u32 *
+avic_get_bk_page_entry(const struct vcpu *v, u32 offset)
+{
+    const struct vlapic *vlapic = vcpu_vlapic(v);
+    char *tmp;
+
+    if ( !vlapic || !vlapic->regs_page )
+        return NULL;
+
+    tmp = (char *)page_to_virt(vlapic->regs_page);
+    return (u32 *)(tmp + offset);

This is also unsafe over 5TB.  vlapic->regs is already a global mapping
which you should use.

Good point.

Also, you should leave a comment by the allocation of regs_page that
AVIC depends on it being a full page allocation.

Ok, I will add a comment in arch/x86/hvm/vlapic.c: vlapic_init().

+}
+
+void svm_avic_update_vapic_bar(const struct vcpu *v, uint64_t data)
+{
+    const struct arch_svm_struct *s = &v->arch.hvm_svm;
+
+    s->vmcb->avic_vapic_bar = data & AVIC_VAPIC_BAR_MASK;
+    s->vmcb->cleanbits.fields.avic = 0;

While I can appreciate what you are trying to do here, altering the
physical address in IA32_APICBASE has never functioned properly under
Xen, as nothing updates the P2M.  Worse, with multi-vcpu guests, the use
of a single shared p2m makes this impossible to do properly.

Ahh.. Good to know

I know it is non-architectural behaviour, but IMO vlapic_msr_set()
should be updated to raise #GP[0] when attempting to change the frame,
to make it fail obviously rather than having interrupts go missing.

Also I see that the Intel side (via vmx_vlapic_msr_changed()) makes the
same mistake.

Okay, it seems that the hvm_msr_write_intercept() has already handled this isn't it? So, I should be able to pretty much removing the handling for the MSR here.

+}
+
+int svm_avic_init_vmcb(struct vcpu *v)
+{
+    paddr_t ma;
+    u32 *apic_id_reg;
+    struct arch_svm_struct *s = &v->arch.hvm_svm;
+    struct vmcb_struct *vmcb = s->vmcb;
+    struct svm_domain *d = &v->domain->arch.hvm_domain.svm;
+    const struct vlapic *vlapic = vcpu_vlapic(v);
+    struct avic_phy_apic_id_ent entry;
+
+    if ( !svm_avic )
+        return 0;
+
+    if ( !vlapic || !vlapic->regs_page )
+        return -EINVAL;

Somewhere you need a bounds check against the APIC ID being within AVIC
limits.

We are checking the bound in the avic_get_phy_apic_id_ent().


+    if ( !s->avic_last_phy_id )
+        return -EINVAL;

Why does a pointer into the physical ID page need storing in
arch_svm_struct?


This was so that we can quickly access the physical APIC ID entry to update them w/o having to call avic_get_phy_apic_id_entry().

+
+    vmcb->avic_bk_pg_pa = page_to_maddr(vlapic->regs_page) & AVIC_HPA_MASK;

This use of AVIC_HPA_MASK may truncate the the address, which is
definitely a problem.

I'm not quite sure that I got the truncation that you pointed out. Could you please elaborate?

If AVIC_HPA_MASK isn't just related to maxphysaddr, then you need to
pass appropriate memflags into the alloc_domheap_page() calls to get a
suitable page.


If by "maxphysaddr" is the 52-bit physical address limit for the PAE mode, then I think that's related.

+
+    entry = *(s->avic_last_phy_id);
+    smp_rmb();
+    entry.bk_pg_ptr = (vmcb->avic_bk_pg_pa & AVIC_HPA_MASK) >> AVIC_HPA_SHIFT;
+    entry.is_running = 0;
+    entry.valid = 1;
+    *(s->avic_last_phy_id) = entry;
+    smp_wmb();

During domain creation, no guests are running, so you can edit this cleanly.
>
What values are actually being excluded here?  This, and other patches,
look like you are actually just trying to do a read_atomic(), modify,
write_atomic() update, rather than actually requiring ordering.


Besides the read_atomic(), modify write_atomic() to update the entry. I also want to make sure that the compiler won't shuffle the order around, which I thought can be achieved via smp_rmb() and smp_wmb().

[...]
@@ -1799,6 +1802,10 @@ static int svm_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)

     switch ( msr )
     {
+    case MSR_IA32_APICBASE:
+        if ( svm_avic_vcpu_enabled(v) )
+            svm_avic_update_vapic_bar(v, msr_content);
+        break;

I think this is dead code.  MSR_IA32_APICBASE is handled in
hvm_msr_write_intercept(), and won't fall into the vendor hook.  If you
were to continue down this route, I would add another pi_ops hook, and
replace the VMX layering violation in vlapic_msr_set().


I see now. I will remove the code added here.


Thanks,
Suravee

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.