[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN
> -----Original Message----- > From: Tian, Kevin > Sent: Tuesday, March 10, 2015 10:22 AM > To: Wu, Feng; xen-devel@xxxxxxxxxxxxx > Cc: Jan Beulich; Zhang, Yang Z > Subject: RE: VT-d Posted-interrupt (PI) design for XEN > > > From: Wu, Feng > > Sent: Wednesday, March 04, 2015 9:30 PM > > > > VT-d Posted-interrupt (PI) design for XEN > > > > Background > > ========== > > With the development of virtualization, there are more and more device > > assignment requirements. However, today when a VM is running with > > assigned devices (such as, NIC), external interrupt handling for the > > assigned > > devices always needs VMM intervention. > > > > VT-d Posted-interrupt is a more enhanced method to handle interrupts > > in the virtualization environment. Interrupt posting is the process by > > which an interrupt request is recorded in a memory-resident > > posted-interrupt-descriptor structure by the root-complex, followed by > > an optional notification event issued to the CPU complex. > > > > With VT-d Posted-interrupt we can get the following advantages: > > - Directly delivery of external interrupts to running vCPUs without VMM > > intervention > > "Directly" -> "Direct" > > > - Decease the interrupt migration complexity. On vCPU migration, software > > can atomically co-migrate all interrupts targeting the migrating vCPU. > > could you elaborate this benefit? I didn't see discussion around migration > throughout the proposal. > > > > > > > Posted-interrupt Introduction > > ======================== > > There are two components to the Posted-interrupt architecture: > > Processor Support and Root-Complex Support > > > > - Processor Support > > Posted-interrupt processing is a feature by which a processor processes > > the virtual interrupts by recording them as pending on the virtual-APIC > > page. > > > > Posted-interrupt processing is enabled by setting the "process posted > > interrupts" VM-execution control. The processing is performed in response > > to the arrival of an interrupt with the posted-interrupt notification > > vector. > > In response to such an interrupt, the processor processes virtual interrupts > > recorded in a data structure called a posted-interrupt descriptor. > > > > More information about APICv and CPU-side Posted-interrupt, please refer > > to Chapter 29, and Section 29.6 in the Intel SDM: > > > http://www.intel.com/content/dam/www/public/us/en/documents/manuals/6 > > 4-ia-32-architectures-software-developer-manual-325462.pdf > > > > - Root-Complex Support > > Interrupt posting is the process by which an interrupt request (from IOAPIC > > or MSI/MSIx capable sources) is recorded in a memory-resident > > posted-interrupt-descriptor structure by the root-complex, followed by > > an optional notification event issued to the CPU complex. The interrupt > > request arriving at the root-complex carry the identity of the interrupt > > request source and a 'remapping-index'. The remapping-index is used to > > look-up an entry from the memory-resident interrupt-remap-table. Unlike > > with interrupt-remapping, the interrupt-remap-table-entry for a posted- > > interrupt, specifies a virtual-vector and a pointer to the posted-interrupt > > descriptor. The virtual-vector specifies the vector of the interrupt to be > > recorded in the posted-interrupt descriptor. The posted-interrupt descriptor > > hosts storage for the virtual-vectors and contains the attributes of the > > notification event (interrupt) to be issued to the CPU complex to inform > > CPU/software about pending interrupts recorded in the posted-interrupt > > descriptor. > > > > More information about VT-d PI, please refer to > > > http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog > > y/vt-directed-io-spec.html > > > > > > Design Overview > > ============== > > In this design, we will cover the following items: > > 1. Add a variant to control whether enable VT-d posted-interrupt or not. > > 2. VT-d PI feature detection. > > 3. Extend posted-interrupt descriptor structure to cover VT-d PI specific > > stuff. > > 4. Extend IRTE structure to support VT-d PI. > > 5. Introduce a new global vector which is used for waking up the HLT'ed > > vCPU. > > HLT'ed -> blocked > > > 6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx > > configuration). > > 7. Update posted-interrupt descriptor during vCPU scheduling (when the state > > of the vCPU is transmitted among RUNSTATE_running / RUNSTATE_blocked/ > > RUNSTATE_runnable / RUNSTATE_offline). > > 8. New boot command line for Xen, which controls VT-d PI feature by user. > > 9. Multicast/broadcast and lowest priority interrupts consideration. > > > > add a step on notification handler, as what you described in another mail. > > > > > Implementation details > > =================== > > - New variant to control VT-d PI > > Like variant 'iommu_intremap' for interrupt remapping, it is very > > straightforward > > to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is > > set > > only when interrupt remapping and VT-d posted-interrupt are both enabled. > > > > - VT-d PI feature detection. > > Bit 59 in VT-d Capability Register is used to report VT-d Posted-interrupt > > support. > > > > - Extend posted-interrupt descriptor structure to cover VT-d PI specific > > stuff. > > Here is the new structure for posted-interrupt descriptor: > > > > struct pi_desc { > > DECLARE_BITMAP(pir, NR_VECTORS); > > union { > > struct > > { > > u64 on : 1, > > sn : 1, > > rsvd_1 : 13, > > ndm : 1, > > nv : 8, > > rsvd_2 : 8, > > ndst : 32; > > }; > > u64 control; > > }; > > u32 rsvd[6]; > > } __attribute__ ((aligned (64))); > > > > - Extend IRTE structure to support VT-d PI. > > Here is the new structure for IRTE: > > /* interrupt remap entry */ > > struct iremap_entry { > > union { > > u64 lo_val; > > struct { > > u64 p : 1, > > fpd : 1, > > dm : 1, > > rh : 1, > > tm : 1, > > dlm : 3, > > avail : 4, > > res_1 : 4, > > vector : 8, > > res_2 : 8, > > dst : 32; > > }lo; > > struct { > > u64 p : 1, > > fpd : 1, > > res_1 : 6, > > avail : 4, > > res_2 : 2, > > urg : 1, > > pst : 1, > > vector : 8, > > res_3 : 14, > > pda_l : 26; > > }lo_intpost; > > }; > > union { > > u64 hi_val; > > struct { > > u64 sid : 16, > > sq : 2, > > svt : 2, > > res_1 : 44; > > }hi; > > struct { > > u64 sid : 16, > > sq : 2, > > svt : 2, > > res_1 : 12, > > pda_h : 32; > > }hi_intpost; > > }; > > }; > > > > - Introduce a new global vector which is used to wake up the HLT'ed vCPU. > > Currently, there is a global vector 'posted_intr_vector', which is used as > > the > > global notification vector for all vCPUs in the system. This vector is > > stored in > > VMCS and CPU considers it as a special vector, uses it to notify the related > > pCPU when an interrupt is recorded in the posted-interrupt descriptor. > > > > After having VT-d PI, VT-d engine can issue notification event when the > > assigned devices issue interrupts. We need add a new global vector to > > wakeup the HLT'ed vCPU, please refer to the following scenario for the > > usage of this new global vector: > > > > 1. vCPU0 is running on pCPU0 > > 2. vCPU0 is HLT'ed and vCPU1 is currently running on pCPU0 > > 3. An external interrupt from an assigned device occurs for vCPU0, if we > > still use 'posted_intr_vector' as the notification vector for vCPU0, the > > notification event for vCPU0 (the event will go to pCPU1) will be consumed > > by vCPU1 incorrectly. The worst case is that vCPU0 will never be woken up > > again since the wakeup event for it is always consumed by other vCPUs > > incorrectly. So we need introduce another global vector, naming > > 'pi_wakeup_vector' > > to wake up the HTL'ed vCPU. > > update above example with design about notification handler. > > > > > - Update IRTE when guest modifies the interrupt configuration (MSI/MSIx > > configuration). > > After VT-d PI is introduced, the format of IRTE is changed as follows: > > Descriptor Address: the address of the posted-interrupt descriptor > > Virtual Vector: the guest vector of the interrupt > > URG: indicates if the interrupt is urgent > > Other fields continue to have the same meaning > > > > 'Descriptor Address' tells the destination vCPU of this interrupt, since > > each vCPU has a dedicated posted-interrupt descriptor. > > > > 'Virtual Vector' tells the guest vector of the interrupt. > > > > When guest changes the configuration of the interrupts, such as, the > > cpu affinity, or the vector, we need to update the associated IRTE > > accordingly. > > > > - Update posted-interrupt descriptor during vCPU scheduling > > The basic idea here is: > > 1. When vCPU's state is RUNSTATE_running, > > - Set 'NV' to 'posted_intr_vector'. > > - Clear 'SN' to accept posted-interrupts. > > - Set 'NDST' to the pCPU on which the vCPU will be running. > > 2. When vCPU's state is RUNSTATE_blocked, > > - Set 'NV' to ' pi_wakeup_vector ', so we can wake up the > > related vCPU when posted-interrupt happens for it. > > Please refer to the above section about the new global vector. > > - Clear 'SN' to accept posted-interrupts > > 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline, > > - Set 'SN' to suppress non-urgent interrupts > > (Current, we only support non-urgent interrupts) > > When vCPU is in RUNSTATE_runnable or RUNSTATE_offline, > > It is not needed to accept posted-interrupt notification event, > > since we don't change the behavior of scheduler when the > interrupt > > occurs, we still need wait the next scheduling of the vCPU. > > When external interrupts from assigned devices occur, the > > interrupts > > are recorded in PIR, and will be synced to IRR before VM-Entry. > > - Set 'NV' to 'posted_intr_vector'. > > would it be safer to use 'pi_wakeup_vector', if it's the right one to use > in the future when we consider real-time scheduling? > Since we don't consider real-time case now, is it better to set 'NV' to 'posted_intr_vector' together with other changes when supporting real-time cases? > > > > - New boot command line for Xen, which controls VT-d PI feature by user. > > Like 'intremap' for interrupt remapping, we add a new boot command line > > 'intpost' for posted-interrupts. > > > > - Multicast/broadcast and lowest priority interrupts consideration > > With VT-d PI, the destination vCPU information of an external interrupt > > from assigned devices is stored in IRTE, this makes the following > > consideration of the design: > > 1. Multicast/broadcast interrupts cannot be posted. > > 2. For lowest-priority interrupts, new Intel CPU/Chipset/root-complex > > (starting from Nehalem) ignore TPR value, and instead supported two other > > ways (configurable by BIOS) on how the handle lowest priority interrupts: > > A) Round robin: In this method, the chipset simply delivers lowest > > priority > > interrupts in a round-robin manner across all the available logical CPUs. > > While > > this provides good load balancing, this was not the best thing to do always > > as > > interrupts from the same device (like NIC) will start running on all the > > CPUs > > thrashing caches and taking locks. This led to the next scheme. > > B) Vector hashing: In this method, hardware would apply a hash function > > on the vector value in the interrupt request, and use that hash to pick a > > logical > > CPU to route the lowest priority interrupt. This way, a given vector always > goes > > to the same logical CPU, avoiding the thrashing problem above. > > > > So, gist of above is that, lowest priority interrupts has never been > > delivered as > > "lowest priority" in physical hardware. > > > > For KVM enabling work of VT-d PI, we divide this into two stage: > > Stage 1: Only support single-CPU lowest-priority interrupts (configured via > > /proc/irq or irqbalance). This is simple and clear. > > Stage 2: After all the patches are merged, I will add the vector hashing > support > > for lowest-priority on VT-d PI. > > > > On Xen side, what is your opinion about support lowest-priority interrupts > > for VT-d PI? > > I'm not sure how important supporting vector hashing is here. We can do same > thing in software when setting NDST in fixed delivery mode? I am not clear about this, here we need find a way to support lowest-priority interrupts, Could you please elaborate it a bit more? Thanks! > > > > > ================================ > > > > Any comments about this design are highly appreciated! > > Could you send an updated version based on all comments so far? Sure! Thanks, Feng > > Thanks, > Kevin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |