[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling
This patch includes the following aspects: - Handling logic when vCPU is blocked: * Add a global vector to wake up the blocked vCPU when an interrupt is being posted to it (This part was sugguested by Yang Zhang <yang.z.zhang@xxxxxxxxx>). * Define two per-cpu variables: 1. pi_blocked_vcpu: A list storing the vCPUs which were blocked on this pCPU. 2. pi_blocked_vcpu_lock: The spinlock to protect pi_blocked_vcpu. - Add some scheduler hooks, this part was suggested by Dario Faggioli <dario.faggioli@xxxxxxxxxx>. * vmx_pre_ctx_switch_pi() It is called before context switch, we update the posted interrupt descriptor when the vCPU is preempted, go to sleep, or is blocked. * vmx_post_ctx_switch_pi() It is called after context switch, we update the posted interrupt descriptor when the vCPU is going to run. * arch_vcpu_wake_prepare() It will be called when waking up the vCPU, we update the posted interrupt descriptor when the vCPU is unblocked. CC: Keir Fraser <keir@xxxxxxx> CC: Jan Beulich <jbeulich@xxxxxxxx> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> CC: Kevin Tian <kevin.tian@xxxxxxxxx> CC: George Dunlap <george.dunlap@xxxxxxxxxxxxx> CC: Dario Faggioli <dario.faggioli@xxxxxxxxxx> Sugguested-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> Signed-off-by: Feng Wu <feng.wu@xxxxxxxxx> Reviewed-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> --- v7: - Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked" into this patch, so it is self-contained and more convenient for code review. - Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static - Coding style - Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt() - Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt() - Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare' - Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel' - Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu' - Use 'spin_lock' and 'spin_unlock' when the interrupt has been already disabled. - Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare - Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c - Call .pi_ctxt_switch_to() __context_switch() instead of directly calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to() - Make .pi_block_cpu unsigned int - Use list_del() instead of list_del_init() - Coding style One remaining item: Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(), need Dario or George's input about this. Changelog for "vmx: Add some scheduler hooks for VT-d posted interrupts" v6: - Add two static inline functions for pi context switch - Fix typos v5: - Rename arch_vcpu_wake to arch_vcpu_wake_prepare - Make arch_vcpu_wake_prepare() inline for ARM - Merge the ARM dummy hook with together - Changes to some code comments - Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if PI is disabled or the vCPU is not in HVM - Coding style v4: - Newly added Changlog for "vmx: posted-interrupt handling when vCPU is blocked" v6: - Fix some typos - Ack the interrupt right after the spin_unlock in pi_wakeup_interrupt() v4: - Use local variables in pi_wakeup_interrupt() - Remove vcpu from the blocked list when pi_desc.on==1, this - avoid kick vcpu multiple times. - Remove tasklet v3: - This patch is generated by merging the following three patches in v2: [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU [RFC v2 10/15] vmx: Define two per-cpu variables [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct' - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler' - Make pi_wakeup_interrupt() static - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' xen/arch/x86/domain.c | 21 ++++ xen/arch/x86/hvm/hvm.c | 6 + xen/arch/x86/hvm/vmx/vmcs.c | 2 + xen/arch/x86/hvm/vmx/vmx.c | 229 +++++++++++++++++++++++++++++++++++++ xen/common/schedule.c | 2 + xen/include/asm-arm/domain.h | 2 + xen/include/asm-x86/domain.h | 3 + xen/include/asm-x86/hvm/hvm.h | 4 + xen/include/asm-x86/hvm/vmx/vmcs.h | 11 ++ xen/include/asm-x86/hvm/vmx/vmx.h | 4 + 10 files changed, 284 insertions(+) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 045f6ff..d64d4eb 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1531,6 +1531,8 @@ static void __context_switch(void) } vcpu_restore_fpu_eager(n); n->arch.ctxt_switch_to(n); + if ( n->arch.pi_ctxt_switch_to ) + n->arch.pi_ctxt_switch_to(n); } psr_ctxt_switch_to(nd); @@ -1573,6 +1575,22 @@ static void __context_switch(void) per_cpu(curr_vcpu, cpu) = n; } +static inline void ctxt_switch_prepare(struct vcpu *prev) +{ + /* + * When switching from non-idle to idle, we only do a lazy context switch. + * However, in order for posted interrupt (if available and enabled) to + * work properly, we at least need to update the descriptors. + */ + if ( prev->arch.pi_ctxt_switch_from && !is_idle_vcpu(prev) ) + prev->arch.pi_ctxt_switch_from(prev); +} + +static inline void ctxt_switch_cancel(struct vcpu *next) +{ + if ( next->arch.pi_ctxt_switch_to && !is_idle_vcpu(next) ) + next->arch.pi_ctxt_switch_to(next); +} void context_switch(struct vcpu *prev, struct vcpu *next) { @@ -1605,9 +1623,12 @@ void context_switch(struct vcpu *prev, struct vcpu *next) set_current(next); + ctxt_switch_prepare(prev); + if ( (per_cpu(curr_vcpu, cpu) == next) || (is_idle_domain(nextd) && cpu_online(cpu)) ) { + ctxt_switch_cancel(next); local_irq_enable(); } else diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index c957610..cfbb56f 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -6817,6 +6817,12 @@ bool_t altp2m_vcpu_emulate_ve(struct vcpu *v) return 0; } +void arch_vcpu_wake_prepare(struct vcpu *v) +{ + if ( hvm_funcs.vcpu_wake_prepare ) + hvm_funcs.vcpu_wake_prepare(v); +} + /* * Local variables: * mode: C diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 5f67797..5abe960 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -661,6 +661,8 @@ int vmx_cpu_up(void) if ( cpu_has_vmx_vpid ) vpid_sync_all(); + vmx_pi_per_cpu_init(cpu); + return 0; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 8e41f4b..f32e062 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -67,6 +67,8 @@ enum handler_return { HNDL_done, HNDL_unhandled, HNDL_exception_raised }; static void vmx_ctxt_switch_from(struct vcpu *v); static void vmx_ctxt_switch_to(struct vcpu *v); +static void vmx_pre_ctx_switch_pi(struct vcpu *v); +static void vmx_post_ctx_switch_pi(struct vcpu *v); static int vmx_alloc_vlapic_mapping(struct domain *d); static void vmx_free_vlapic_mapping(struct domain *d); @@ -83,7 +85,21 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); static void vmx_invlpg_intercept(unsigned long vaddr); static int vmx_vmfunc_intercept(struct cpu_user_regs *regs); +/* + * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we + * can find which vCPU should be woken up. + */ +static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); +static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); + uint8_t __read_mostly posted_intr_vector; +uint8_t __read_mostly pi_wakeup_vector; + +void vmx_pi_per_cpu_init(unsigned int cpu) +{ + INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu)); + spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu)); +} static int vmx_domain_initialise(struct domain *d) { @@ -106,10 +122,23 @@ static int vmx_vcpu_initialise(struct vcpu *v) spin_lock_init(&v->arch.hvm_vmx.vmcs_lock); + INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); + INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list); + + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS; + + spin_lock_init(&v->arch.hvm_vmx.pi_lock); + v->arch.schedule_tail = vmx_do_resume; v->arch.ctxt_switch_from = vmx_ctxt_switch_from; v->arch.ctxt_switch_to = vmx_ctxt_switch_to; + if ( iommu_intpost && has_hvm_container_vcpu(v) ) + { + v->arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi; + v->arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi; + } + if ( (rc = vmx_create_vmcs(v)) != 0 ) { dprintk(XENLOG_WARNING, @@ -707,6 +736,155 @@ static void vmx_fpu_leave(struct vcpu *v) } } +void vmx_vcpu_wake_prepare(struct vcpu *v) +{ + unsigned long flags; + + if ( !iommu_intpost || !has_hvm_container_vcpu(v) || + !has_arch_pdevs(v->domain) ) + return; + + spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, flags); + + if ( !test_bit(_VPF_blocked, &v->pause_flags) ) + { + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; + unsigned int pi_block_cpu; + + /* + * We don't need to send notification event to a non-running + * vcpu, the interrupt information will be delivered to it before + * VM-ENTRY when the vcpu is scheduled to run next time. + */ + pi_set_sn(pi_desc); + + /* + * Set 'NV' field back to posted_intr_vector, so the + * Posted-Interrupts can be delivered to the vCPU by + * VT-d HW after it is scheduled to run. + */ + write_atomic(&pi_desc->nv, posted_intr_vector); + + /* + * Delete the vCPU from the related block list + * if we are resuming from blocked state. + */ + pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu; + if ( pi_block_cpu == NR_CPUS ) + goto out; + + spin_lock(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu)); + + /* + * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was + * removed from the blocking list while we are acquiring the lock. + */ + if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS ) + { + spin_unlock(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu)); + goto out; + } + + list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list); + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS; + spin_unlock(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu)); + } + +out: + spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, flags); +} + +static void vmx_pre_ctx_switch_pi(struct vcpu *v) +{ + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; + unsigned long flags; + + if ( !has_arch_pdevs(v->domain) ) + return; + + spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, flags); + + if ( !test_bit(_VPF_blocked, &v->pause_flags) ) + { + /* + * The vCPU has been preempted or went to sleep. We don't need to send + * notification event to a non-running vcpu, the interrupt information + * will be delivered to it before VM-ENTRY when the vcpu is scheduled + * to run next time. + */ + pi_set_sn(pi_desc); + + } + else + { + struct pi_desc old, new; + unsigned int dest; + + /* + * The vCPU is blocking, we need to add it to one of the per pCPU lists. + * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for + * the per-CPU list, we also save it to posted-interrupt descriptor and + * make it as the destination of the wake-up notification event. + */ + v->arch.hvm_vmx.pi_block_cpu = v->processor; + + spin_lock(&per_cpu(pi_blocked_vcpu_lock, v->arch.hvm_vmx.pi_block_cpu)); + list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list, + &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu)); + spin_unlock(&per_cpu(pi_blocked_vcpu_lock, + v->arch.hvm_vmx.pi_block_cpu)); + + do { + old.control = new.control = pi_desc->control; + + /* Should not block the vCPU if an interrupt was posted for it. */ + if ( pi_test_on(&old) ) + { + spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, flags); + vcpu_unblock(v); + return; + } + + /* + * Change the 'NDST' field to v->arch.hvm_vmx.pi_block_cpu, + * so when external interrupts from assigned deivces happen, + * wakeup notifiction event will go to + * v->arch.hvm_vmx.pi_block_cpu, then in pi_wakeup_interrupt() + * we can find the vCPU in the right list to wake up. + */ + dest = cpu_physical_id(v->arch.hvm_vmx.pi_block_cpu); + + if ( x2apic_enabled ) + new.ndst = dest; + else + new.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK); + + pi_clear_sn(&new); + new.nv = pi_wakeup_vector; + } while ( cmpxchg(&pi_desc->control, old.control, new.control) != + old.control ); + } + + spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, flags); +} + +static void vmx_post_ctx_switch_pi(struct vcpu *v) +{ + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; + + if ( !has_arch_pdevs(v->domain) ) + return; + + if ( x2apic_enabled ) + write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor)); + else + write_atomic(&pi_desc->ndst, + MASK_INSR(cpu_physical_id(v->processor), + PI_xAPIC_NDST_MASK)); + + pi_clear_sn(pi_desc); +} + static void vmx_ctxt_switch_from(struct vcpu *v) { /* @@ -1975,6 +2153,53 @@ static struct hvm_function_table __initdata vmx_function_table = { .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc, }; +/* Handle VT-d posted-interrupt when VCPU is blocked. */ +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) +{ + struct arch_vmx_struct *vmx, *tmp; + struct vcpu *v; + spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id()); + struct list_head *blocked_vcpus = + &per_cpu(pi_blocked_vcpu, smp_processor_id()); + LIST_HEAD(list); + + ack_APIC_irq(); + this_cpu(irq_count)++; + + spin_lock(lock); + + /* + * XXX: The length of the list depends on how many vCPU is current + * blocked on this specific pCPU. This may hurt the interrupt latency + * if the list grows to too many entries. + */ + list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list) + { + if ( pi_test_on(&vmx->pi_desc) ) + { + list_del(&vmx->pi_blocked_vcpu_list); + vmx->pi_block_cpu = NR_CPUS; + + /* + * We cannot call vcpu_unblock here, since it also needs + * 'pi_blocked_vcpu_lock', we store the vCPUs with ON + * set in another list and unblock them after we release + * 'pi_blocked_vcpu_lock'. + */ + list_add_tail(&vmx->pi_vcpu_on_set_list, &list); + } + } + + spin_unlock(lock); + + list_for_each_entry_safe(vmx, tmp, &list, pi_vcpu_on_set_list) + { + v = container_of(vmx, struct vcpu, arch.hvm_vmx); + list_del(&vmx->pi_vcpu_on_set_list); + vcpu_unblock(v); + } +} + /* Handle VT-d posted-interrupt when VCPU is running. */ static void pi_notification_interrupt(struct cpu_user_regs *regs) { @@ -2061,7 +2286,11 @@ const struct hvm_function_table * __init start_vmx(void) if ( cpu_has_vmx_posted_intr_processing ) { if ( iommu_intpost ) + { alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt); + alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt); + vmx_function_table.vcpu_wake_prepare = vmx_vcpu_wake_prepare; + } else alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt); } diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 3eefed7..bc49098 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -412,6 +412,8 @@ void vcpu_wake(struct vcpu *v) unsigned long flags; spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags); + arch_vcpu_wake_prepare(v); + if ( likely(vcpu_runnable(v)) ) { if ( v->runstate.state >= RUNSTATE_blocked ) diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 56aa208..cffe2c6 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -301,6 +301,8 @@ static inline register_t vcpuid_to_vaffinity(unsigned int vcpuid) return vaff; } +static inline void arch_vcpu_wake_prepare(struct vcpu *v) {} + #endif /* __ASM_DOMAIN_H__ */ /* diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index 0fce09e..979210a 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -481,6 +481,9 @@ struct arch_vcpu void (*ctxt_switch_from) (struct vcpu *); void (*ctxt_switch_to) (struct vcpu *); + void (*pi_ctxt_switch_from) (struct vcpu *); + void (*pi_ctxt_switch_to) (struct vcpu *); + struct vpmu_struct vpmu; /* Virtual Machine Extensions */ diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index 3cac64f..50c112f 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -212,6 +212,8 @@ struct hvm_function_table { void (*altp2m_vcpu_update_vmfunc_ve)(struct vcpu *v); bool_t (*altp2m_vcpu_emulate_ve)(struct vcpu *v); int (*altp2m_vcpu_emulate_vmfunc)(struct cpu_user_regs *regs); + + void (*vcpu_wake_prepare)(struct vcpu *v); }; extern struct hvm_function_table hvm_funcs; @@ -545,6 +547,8 @@ static inline bool_t hvm_altp2m_supported(void) return hvm_funcs.altp2m_supported; } +void arch_vcpu_wake_prepare(struct vcpu *v); + #ifndef NDEBUG /* Permit use of the Forced Emulation Prefix in HVM guests */ extern bool_t opt_hvm_fep; diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index b7f78e3..65d8523 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -160,6 +160,17 @@ struct arch_vmx_struct { struct page_info *vmwrite_bitmap; struct page_info *pml_pg; + + struct list_head pi_blocked_vcpu_list; + struct list_head pi_vcpu_on_set_list; + + /* + * Before vCPU is blocked, it is added to the global per-cpu list + * of 'pi_block_cpu', then VT-d engine can send wakeup notification + * event to 'pi_block_cpu' and wakeup the related vCPU. + */ + unsigned int pi_block_cpu; + spinlock_t pi_lock; }; int vmx_create_vmcs(struct vcpu *v); diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index 70b254f..2eaea32 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -28,6 +28,8 @@ #include <asm/hvm/trace.h> #include <asm/hvm/vmx/vmcs.h> +extern uint8_t pi_wakeup_vector; + typedef union { struct { u64 r : 1, /* bit 0 - Read permission */ @@ -557,6 +559,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m); void free_p2m_hap_data(struct p2m_domain *p2m); void p2m_init_hap_data(struct p2m_domain *p2m); +void vmx_pi_per_cpu_init(unsigned int cpu); + /* EPT violation qualifications definitions */ #define _EPT_READ_VIOLATION 0 #define EPT_READ_VIOLATION (1UL<<_EPT_READ_VIOLATION) -- 2.1.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |