[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v6 01/24] docs: create Cache Allocation Technology (CAT) and Code and Data Prioritization (CDP) feature document
On Wed, Feb 08, 2017 at 04:15:53PM +0800, Yi Sun wrote: > This patch creates CAT and CDP feature document in doc/features/. It describes > key points to implement L3 CAT/CDP and L2 CAT which is described in details in > Intel SDM "INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION > FEATURES". > > Signed-off-by: Yi Sun <yi.y.sun@xxxxxxxxxxxxxxx> > --- > v6: > - write a new feature document to cover L3 CAT/CDP and L2 CAT. > - adjust 'Terminology' position in document. > - fix wordings. > - add SDM chapter title in commit message. > - add more explanations. > --- > docs/features/intel_psr_cat_cdp.pandoc | 453 > +++++++++++++++++++++++++++++++++ > 1 file changed, 453 insertions(+) > create mode 100644 docs/features/intel_psr_cat_cdp.pandoc > > diff --git a/docs/features/intel_psr_cat_cdp.pandoc > b/docs/features/intel_psr_cat_cdp.pandoc > new file mode 100644 > index 0000000..ebce2bd > --- /dev/null > +++ b/docs/features/intel_psr_cat_cdp.pandoc > @@ -0,0 +1,453 @@ > +% Intel Cache Allocation Technology and Code and Data Prioritization Features > +% Revision 1.0 > + > +\clearpage > + > +# Basics > + > +---------------- ---------------------------------------------------- > + Status: **Tech Preview** > + > +Architecture(s): Intel x86 > + > + Component(s): Hypervisor, toolstack > + > + Hardware: L3 CAT: Haswell and beyond CPUs > + CDP : Broadwell and beyond CPUs > + L2 CAT: Atom codename Goldmont and beyond CPUs > +---------------- ---------------------------------------------------- > + > +# Terminology > + > +* CAT Cache Allocation Technology > +* CBM Capacity BitMasks > +* CDP Code and Data Prioritization > +* COS/CLOS Class of Service > +* MSRs Machine Specific Registers > +* PSR Intel Platform Shared Resource > + > +# Overview > + > +Intel provides a set of allocation capabilities including Cache Allocatation > +Technology (CAT) and Code and Data Prioritization (CDP). > + > +CAT allows an OS or hypervisor to control allocation of a CPU's shared cache > +based on application priority or Class of Service (COS). Each COS is > configured > +using capacity bitmasks (CBMs) which represent cache capacity and indicate > the > +degree of overlap and isolation between classes. Once CAT is configured, the > pr- > +ocessor allows access to portions of cache according to the established COS. > +Intel Xeon processor E5 v4 family (and some others) introduce capabilities to > +configure and make use of the CAT mechanism on the L3 cache. Intel Goldmont > pro- > +cessor provides support for control over the L2 cache. > + > +Code and Data Prioritization (CDP) Technology is an extension of CAT. CDP > +enables isoloation and separate prioritization of code and data fetches to ^^^^^^^^^^ isolation > +the L3 cahce in a SW configurable manner, which can enable workload priorit- ^^^^^ cache > +ization and tuning of cache capacity to the characteristics of the workload. > +CDP extends CAT by providing separate code and data masks per Class of > Service > +(COS). When SW configures to enable CDP, L3 CAT is disabled. > + > +# User details > + > +* Feature Enabling: > + > + Add "psr=cat" to boot line parameter to enable all supported level CAT > featu- > + res. Add "psr=cdp" to enable L3 CDP but disables L3 CAT by SW. > + > +* xl interfaces: > + > + 1. `psr-cat-show [OPTIONS] domain-id`: > + > + Show L2 CAT or L3 CAT/CDP CBM of the domain designated by Xen domain-id. > + > + Option `-l`: > + `-l2`: Show cbm for L2 cache. > + `-l3`: Show cbm for L3 cache. > + > + If `-lX` is specified and LX is not supported, print error. > + If no `-l` is specified, level 3 is the default option. > + > + 2. `psr-cat-set [OPTIONS] domain-id cbm`: > + > + Set L2 CAT or L3 CAT/CDP CBM to the domain designated by Xen domain-id. > + > + Option `-s`: Specify the socket to process, otherwise all sockets are > + processed. > + > + Option `-l`: > + `-l2`: Specify cbm for L2 cache. > + `-l3`: Specify cbm for L3 cache. > + > + If `-lX` is specified and LX is not supported, print error. > + If no `-l` is specified, level 3 is the default option. > + > + Option `-c` or `-d`: > + `-c`: Set L3 CDP code cbm. > + `-d`: Set L3 CDP data cbm. > + > + 3. `psr-hwinfo [OPTIONS]`: > + > + Show CMT & L2 CAT & L3 CAT/CDP HW information on every socket. > + > + Option `-m, --cmt`: Show Cache Monitoring Technology (CMT) hardware > info. > + > + Option `-a, --cat`: Show CAT/CDP hardware info. > + > +# Technical details > + > +L3 CAT/CDP and L2 CAT are all members of Intel PSR features, they share the > base > +PSR infrastructure in Xen. > + > +## Hardware perspective > + > + CAT/CDP defines a range of MSRs to assign different cache access patterns > + which are known as CBMs, each CBM is associated with a COS. > + > + ``` > + E.g. L2 CAT: > + +----------------------------+----------------+ > + IA32_PQR_ASSOC | MSR (per socket) | Address | > + +----+---+-------+ +----------------------------+----------------+ > + | |COS| | | IA32_L2_QOS_MASK_0 | 0xD10 | > + +----+---+-------+ +----------------------------+----------------+ > + └-------------> | ... | ... | > + +----------------------------+----------------+ > + | IA32_L2_QOS_MASK_n | 0xD10+n (n<64) | > + +----------------------------+----------------+ > + ``` > + > + L3 CAT/CDP uses a range of MSRs from 0xC90 ~ 0xC90+n (n<128). > + > + L2 CAT uses a range of MSRs from 0xD10 ~ 0xD10+n (n<64), following the L3 > + CAT/CDP MSRs, setting different L2 cache accessing patterns from L3 cache > is > + supported. > + > + Every MSR stores a CBM value. A capacity bitmask (CBM) provides a hint to > the > + hardware indicating the cache space an application should be limited to as s/application/VM/ ? > + well as providing an indication of overlap and isolation in the CAT-capable > + cache from other applications contending for the cache. s/application/VM/ ? Perhaps 'domain' as you use that later in the document? > + > + Sample cache capacity bitmasks for a bitlength of 8 are shown below. Please > + note that all (and only) contiguous '1' combinations are allowed (e.g. > FFFFH, > + 0FF0H, 003CH, etc.). > + > + ``` > + +----+----+----+----+----+----+----+----+ > + | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 | > + +----+----+----+----+----+----+----+----+ > + COS0 | A | A | A | A | A | A | A | A | Default Bitmask > + +----+----+----+----+----+----+----+----+ > + COS1 | A | A | A | A | A | A | A | A | > + +----+----+----+----+----+----+----+----+ > + COS2 | A | A | A | A | A | A | A | A | > + +----+----+----+----+----+----+----+----+ > + > + +----+----+----+----+----+----+----+----+ > + | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 | > + +----+----+----+----+----+----+----+----+ > + COS0 | A | A | A | A | A | A | A | A | Overlapped Bitmask > + +----+----+----+----+----+----+----+----+ > + COS1 | | | | | A | A | A | A | > + +----+----+----+----+----+----+----+----+ > + COS2 | | | | | | | A | A | > + +----+----+----+----+----+----+----+----+ > + > + +----+----+----+----+----+----+----+----+ > + | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 | > + +----+----+----+----+----+----+----+----+ > + COS0 | A | A | A | A | | | | | Isolated Bitmask > + +----+----+----+----+----+----+----+----+ > + COS1 | | | | | A | A | | | > + +----+----+----+----+----+----+----+----+ > + COS2 | | | | | | | A | A | > + +----+----+----+----+----+----+----+----+ > + ``` > + > + We can get the CBM length through CPUID. The default value of CBM is > calcul- > + ated by `(1ull << cbm_len) - 1`. That is a fully open bitmask, all ones > bitm- > + ask. The COS[0] always stores the default value without change. > + > + There is a `IA32_PQR_ASSOC` register which stores the COS ID of the VCPU. > HW > + enforces cache allocation according to the corresponding CBM. > + > +## The relationship between L3 CAT/CDP and L2 CAT > + > + HW may support all features. By default, CDP is disabled on the processor. > + If the L3 CAT MSRs are used without enabling CDP, the processor operates in > + a traditional CAT-only mode. When CDP is enabled, s/,/:/ > + * the CAT mask MSRs are re-mapped into interleaved pairs of mask MSRs for > + data or code fetches. > + * the range of COS for CAT is re-indexed, with the lower-half of the COS > + range available for CDP. > + > + L2 CAT is independent of L3 CAT/CDP, which means L2 CAT can be enabled > while > + L3 CAT/CDP is disabled, or L2 CAT and L3 CAT/CDP are both enabled. > + > + As a requirement, the bits of CBM of CAT/CDP must be continuous. > + > + N.B. L2 CAT and L3 CAT/CDP share the same COS field in the same associate > + register `IA32_PQR_ASSOC`, which means one COS is associated with a pair of > + L2 CAT CBM and L3 CAT/CDP CBM. > + > + Besides, the max COS of L2 CAT may be different from L3 CAT/CDP (or other > + PSR features in future). In some cases, a VM is permitted to have a COS I noticed you say 'domain' later on in the document. Would it make sense to replace s/VM/domain/ to be same in this design? > + that is beyond one (or more) of PSR features but within the others. For > + instance, let's assume the max COS of L2 CAT is 8 but the max COS of L3 > + CAT is 16, when a VM is assigned 9 as COS, the L3 CAT CBM associated to > + COS 9 would be enforced, but for L2 CAT, the HW works as default value is > + set since COS 9 is beyond the max COS (8) of L2 CAT. > + > +## Design Overview > + > +* Core COS/CBM association > + > + When enforcing CAT/CDP, all cores of domains have the same default COS > (COS0) > + which is associated with the fully open CBM (all ones bitmask) to access > all > + cache. The default COS is used only in hypervisor and is transparent to > tool > + stack and user. > + > + System administrator can change PSR allocation policy at runtime by tool > stack. > + Since L2 CAT shares COS with L3 CAT/CDP, a COS corresponds to a 2-tuple, > like > + [L2 CBM, L3 CBM] with only-CAT enabled, when CDP is enabled, one COS > correspo- > + nds to a 3-tuple, like [L2 CBM, L3 Code_CBM, L3 Data_CBM]. If neither L3 > CAT > + nor L3 CDP is enabled, things would be easier, one COS corresponds to one > L2 > + CBM. > + > +* VCPU schedule > + > + When context switch happens, the COS of VCPU is written to per-thread MSR > + `IA32_PQR_ASSOC`, and then hardware enforces cache allocation according to > + the corresponding CBM. > + > +* Multi-sockets > + > + Different sockets may have different CAT/CDP capability (e.g. max COS) > alth- > + ough it is consistent on the same socket. So the capability of per-socket > CAT/ > + CDP is specified. > + > + 'psr-cat-set' can set CBM for one domain per socket. On each socket, we > main- > + tain a COS array for all domains. One domain uses one COS at one time. One > COS > + stores the CBM of the domain to work. So, when a VCPU of the domain is > migrat- > + ed from socket 1 to socket 2, it follows configuration on socket 2. > + > + E.g. user sets domain 1 CBM on socket 1 to 0x7f which uses COS 9 but sets > do- > + main 1 CBM on socket 2 to 0x3f which uses COS 7. When VCPU of this domain > + is migrated from socket 1 to 2, the COS ID used is 7, that means 0x3f is > the > + CBM to work for this domain 1 now. > + > +## Implementation Description > + > +* Hypervisor interfaces: > + > + 1. Boot line parameter "psr=cat" enables L2 CAT and L3 CAT if hardware > suppo- > + rted. "psr=cdp" enables CDP if hardware supported. > + > + 2. SYSCTL: > + - XEN_SYSCTL_PSR_CAT_get_l3_info: Get L3 CAT/CDP information. > + - XEN_SYSCTL_PSR_CAT_get_l2_info: Get L2 CAT information. > + > + 3. DOMCTL: > + - XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM: Get L3 CBM for a domain. > + - XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM: Set L3 CBM for a domain. > + - XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE: Get CDP Code CBM for a domain. > + - XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE: Set CDP Code CBM for a domain. > + - XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA: Get CDP Data CBM for a domain. > + - XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA: Set CDP Data CBM for a domain. > + - XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM: Get L2 CBM for a domain. > + - XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM: Set L2 CBM for a domain. > + > +* xl interfaces: > + > + 1. psr-cat-show -lX domain-id > + Show LX cbm for a domain. > + => XEN_SYSCTL_PSR_CAT_get_l3_info / > + XEN_SYSCTL_PSR_CAT_get_l2_info / > + XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM / > + XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE / > + XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA / > + XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM > + > + 2. psr-cat-set -lX domain-id cbm > + Set LX cbm for a domain. > + => XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM / > + XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE / > + XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA / > + XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM > + > + 3. psr-hwinfo > + Show PSR HW information, including L3 CAT/CDP/L2 CAT > + => XEN_SYSCTL_PSR_CAT_get_l3_info / > + XEN_SYSCTL_PSR_CAT_get_l2_info > + > +* Key data structure: > + > + 1. Feature HW info > + > + ``` > + struct psr_cat_hw_info { > + unsigned int cbm_len; > + unsigned int cos_max; > + }; > + ``` > + > + - Member `cbm_len` > + > + `cbm_len` is one of the hardware info of CAT. It means the max number > + of bits to set. > + > + - Member `cos_max` > + > + `cos_max` is one of the hardware info of CAT. It means the max number > + of COS registers. > + > + 2. Feature list node > + > + ``` > + struct feat_node { > + enum psr_feat_type feature; > + struct feat_ops ops; > + struct psr_cat_hw_info info; > + uint64_t cos_reg_val[MAX_COS_REG_NUM]; > + struct list_head list; > + }; > + ``` > + > + When a PSR enforcement feature is enabled, it will be added into a > + feature list. The head of the list is created in psr initialization. > + > + - Member `feature` > + > + `feature` is an integer number, to indicate which feature the list > entry > + corresponds to. > + > + - Member `ops` > + > + `ops` maintains a callback function list of the feature. It will be > introduced > + in details later at `4. Feature operation functions structure`. I think you can just do: [Feature operation functions structure] And when you run `pandoc -toc -o intel_psr_cat_cdp.pdf intel_psr_cat_cdp.pandoc` it will provide the right link (which you can follow) to the proper section. > + > + - Member `info` > + > + `info` maintains the feature HW information which are provided to > psr_hwinfo > + command. > + > + - Member `cos_reg_val` > + > + `cos_reg_val` is an array to maintain the value set in all COS > registers of > + the feature. The array is indexed by COS ID. > + > + 3. Per-socket PSR features information structure > + > + ``` > + struct psr_socket_info { > + unsigned int feat_mask; > + unsigned int nr_feat; > + struct list_head feat_list; > + unsigned int cos_ref[MAX_COS_REG_NUM]; > + spinlock_t ref_lock; > + }; > + ``` > + > + We collect all PSR allocation features information of a socket in this > + `struct psr_socket_info`. > + > + - Member `feat_mask` > + > + `feat_mask` is a bitmap, to indicate which feature is enabled on > current > + socket. We define `feat_mask` bitmap as: > + > + bit 0: L3 CAT status. > + bit 1: L3 CDP status. > + bit 2: L2 CAT status. Just in case if you change the code and there are more bit positions - I would recommend you replace the 'We define 'feat_mask' bitmap as: .. bit 0 .." with: "See values defined in 'enum psr_feat_type'" As that will make it easier in case the code is changed but the doc becomes out-dated. > + > + - Member `nr_feat` > + > + `nr_feat` means the number of PSR features enabled. > + > + - Member `cos_ref` > + > + `cos_ref` is an array which maintains the reference of one COS. It > maps > + to cos_reg_val[MAX_COS_REG_NUM] in `struct feat_node`. If one COS is > + used by one domain, the corresponding reference will increase by > one. If > + a domain releases the COS, the reference will decrease by one. The > array > + is indexed by COS ID. > + > + 4. Feature operation functions structure > + > + ``` > + struct feat_ops { > + unsigned int (*get_cos_max)(const struct feat_node *feat); > + int (*get_feat_info)(const struct feat_node *feat, > + uint32_t data[], uint32_t array_len); > + int (*get_val)(const struct feat_node *feat, unsigned int cos, > + enum cbm_type type, uint64_t *val); > + unsigned int (*get_cos_num)(const struct feat_node *feat); > + int (*get_old_val)(uint64_t val[], > + const struct feat_node *feat, > + unsigned int old_cos); > + int (*set_new_val)(uint64_t val[], > + const struct feat_node *feat, > + unsigned int old_cos, > + enum cbm_type type, > + uint64_t m); > + int (*compare_val)(const uint64_t val[], const struct feat_node > *feat, > + unsigned int cos, bool *found); > + unsigned int (*fits_cos_max)(const uint64_t val[], > + const struct feat_node *feat, > + unsigned int cos); > + int (*write_msr)(unsigned int cos, const uint64_t val[], > + struct feat_node *feat); > + }; > + ``` > + > + We abstract above callback functions to encapsulate the feature > specific > + behaviors into them. Then, it is easy to add a new feature. We just > need: > + 1) Implement such ops and callback functions for every feature. > + 2) Register the ops into `struct feat_node`. > + 3) Add the feature into feature list during CPU initialization. > + > +# Limitations > + > +CAT/CDP can only work on HW which enables it(check by CPUID). So far, there > is > +no HW which enables both L2 CAT and L3 CAT/CDP. But SW implementation has > cons- > +idered such scenario to enable both L2 CAT and L3 CAT/CDP. > + > +# Testing > + > +We can execute above xl commands to verify L2 CAT and L3 CAT/CDP on different > +HWs support them. > + > +For example: > + root@:~$ xl psr-hwinfo --cat > + Cache Allocation Technology (CAT): L2 > + Socket ID : 0 > + Maximum COS : 3 > + CBM length : 8 > + Default CBM : 0xff > + > + root@:~$ xl psr-cat-cbm-set -l2 1 0x7f > + > + root@:~$ xl psr-cat-show -l2 1 > + Socket ID : 0 > + Default CBM : 0xff > + ID NAME CBM > + 1 ubuntu14 0x7f > + > +# Areas for improvement > + > +N/A I would say that using '0x7f' is not very user-friendly. It really would be good if that changed to something easier to grok. For example if I am system admin and I look at: +----+----+----+----+----+----+----+----+ | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 | +----+----+----+----+----+----+----+----+ COS0 | A | A | A | A | | | | | Isolated Bitmask +----+----+----+----+----+----+----+----+ COS1 | | | | | A | A | | | +----+----+----+----+----+----+----+----+ COS2 | | | | | | | A | A | +----+----+----+----+----+----+----+----+ I would think that giving an guest 'M7->M4' means it has more cache than M3->M2 or M1->M0. But that is not spelled in details. Or what happens if I do: +----+----+----+----+----+----+----+----+ | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 | +----+----+----+----+----+----+----+----+ COS0 | | | | | | A | | | Isolated Bitmask +----+----+----+----+----+----+----+----+ COS1 | | | | | | | A | | +----+----+----+----+----+----+----+----+ COS2 | | | | | | | | A | +----+----+----+----+----+----+----+----+ Does that have the same effect as the previous one? I would think not, but perhaps it is the same (we set three 'pools'). And does this mean that I've made a grave error and M7->M3 are in effect only available to the hypervisor (and dom0?, but only if dom0 is PV, but not for PVH dom0?) Thanks! > + > +# Known issues > + > +N/A > + > +# References > + > +"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" > [Intel® 64 and IA-32 Architectures Software Developer Manuals, > vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html) > + > +# History > + > +------------------------------------------------------------------------ > +Date Revision Version Notes > +---------- -------- -------- ------------------------------------------- > +2016-08-12 1.0 Xen 4.9 Design document written Perhaps update that a bit? I think we are at 1.6 ? > +---------- -------- -------- ------------------------------------------- > -- > 1.9.1 > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |