[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 01/24] docs: create Cache Allocation Technology (CAT) and Code and Data Prioritization (CDP) feature document



On Wed, Feb 08, 2017 at 04:15:53PM +0800, Yi Sun wrote:
> This patch creates CAT and CDP feature document in doc/features/. It describes
> key points to implement L3 CAT/CDP and L2 CAT which is described in details in
> Intel SDM "INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION 
> FEATURES".
> 
> Signed-off-by: Yi Sun <yi.y.sun@xxxxxxxxxxxxxxx>
> ---
> v6:
>     - write a new feature document to cover L3 CAT/CDP and L2 CAT.
>     - adjust 'Terminology' position in document.
>     - fix wordings.
>     - add SDM chapter title in commit message.
>     - add more explanations.
> ---
>  docs/features/intel_psr_cat_cdp.pandoc | 453 
> +++++++++++++++++++++++++++++++++
>  1 file changed, 453 insertions(+)
>  create mode 100644 docs/features/intel_psr_cat_cdp.pandoc
> 
> diff --git a/docs/features/intel_psr_cat_cdp.pandoc 
> b/docs/features/intel_psr_cat_cdp.pandoc
> new file mode 100644
> index 0000000..ebce2bd
> --- /dev/null
> +++ b/docs/features/intel_psr_cat_cdp.pandoc
> @@ -0,0 +1,453 @@
> +% Intel Cache Allocation Technology and Code and Data Prioritization Features
> +% Revision 1.0
> +
> +\clearpage
> +
> +# Basics
> +
> +---------------- ----------------------------------------------------
> +         Status: **Tech Preview**
> +
> +Architecture(s): Intel x86
> +
> +   Component(s): Hypervisor, toolstack
> +
> +       Hardware: L3 CAT: Haswell and beyond CPUs
> +                 CDP   : Broadwell and beyond CPUs
> +                 L2 CAT: Atom codename Goldmont and beyond CPUs
> +---------------- ----------------------------------------------------
> +
> +# Terminology
> +
> +* CAT         Cache Allocation Technology
> +* CBM         Capacity BitMasks
> +* CDP         Code and Data Prioritization
> +* COS/CLOS    Class of Service
> +* MSRs        Machine Specific Registers
> +* PSR         Intel Platform Shared Resource
> +
> +# Overview
> +
> +Intel provides a set of allocation capabilities including Cache Allocatation
> +Technology (CAT) and Code and Data Prioritization (CDP).
> +
> +CAT allows an OS or hypervisor to control allocation of a CPU's shared cache
> +based on application priority or Class of Service (COS). Each COS is 
> configured
> +using capacity bitmasks (CBMs) which represent cache capacity and indicate 
> the
> +degree of overlap and isolation between classes. Once CAT is configured, the 
> pr-
> +ocessor allows access to portions of cache according to the established COS.
> +Intel Xeon processor E5 v4 family (and some others) introduce capabilities to
> +configure and make use of the CAT mechanism on the L3 cache. Intel Goldmont 
> pro-
> +cessor provides support for control over the L2 cache.
> +
> +Code and Data Prioritization (CDP) Technology is an extension of CAT. CDP
> +enables isoloation and separate prioritization of code and data fetches to
           ^^^^^^^^^^
isolation

> +the L3 cahce in a SW configurable manner, which can enable workload priorit-
          ^^^^^
cache
> +ization and tuning of cache capacity to the characteristics of the workload.
> +CDP extends CAT by providing separate code and data masks per Class of 
> Service
> +(COS). When SW configures to enable CDP, L3 CAT is disabled.
> +
> +# User details
> +
> +* Feature Enabling:
> +
> +  Add "psr=cat" to boot line parameter to enable all supported level CAT 
> featu-
> +  res. Add "psr=cdp" to enable L3 CDP but disables L3 CAT by SW.
> +
> +* xl interfaces:
> +
> +  1. `psr-cat-show [OPTIONS] domain-id`:
> +
> +     Show L2 CAT or L3 CAT/CDP CBM of the domain designated by Xen domain-id.
> +
> +     Option `-l`:
> +     `-l2`: Show cbm for L2 cache.
> +     `-l3`: Show cbm for L3 cache.
> +
> +     If `-lX` is specified and LX is not supported, print error.
> +     If no `-l` is specified, level 3 is the default option.
> +
> +  2. `psr-cat-set [OPTIONS] domain-id cbm`:
> +
> +     Set L2 CAT or L3 CAT/CDP CBM to the domain designated by Xen domain-id.
> +
> +     Option `-s`: Specify the socket to process, otherwise all sockets are
> +     processed.
> +
> +     Option `-l`:
> +     `-l2`: Specify cbm for L2 cache.
> +     `-l3`: Specify cbm for L3 cache.
> +
> +     If `-lX` is specified and LX is not supported, print error.
> +     If no `-l` is specified, level 3 is the default option.
> +
> +     Option `-c` or `-d`:
> +     `-c`: Set L3 CDP code cbm.
> +     `-d`: Set L3 CDP data cbm.
> +
> +  3. `psr-hwinfo [OPTIONS]`:
> +
> +     Show CMT & L2 CAT & L3 CAT/CDP HW information on every socket.
> +
> +     Option `-m, --cmt`: Show Cache Monitoring Technology (CMT) hardware 
> info.
> +
> +     Option `-a, --cat`: Show CAT/CDP hardware info.
> +
> +# Technical details
> +
> +L3 CAT/CDP and L2 CAT are all members of Intel PSR features, they share the 
> base
> +PSR infrastructure in Xen.
> +
> +## Hardware perspective
> +
> +  CAT/CDP defines a range of MSRs to assign different cache access patterns
> +  which are known as CBMs, each CBM is associated with a COS.
> +
> +  ```
> +  E.g. L2 CAT:
> +                          +----------------------------+----------------+
> +     IA32_PQR_ASSOC       | MSR (per socket)           |    Address     |
> +   +----+---+-------+     +----------------------------+----------------+
> +   |    |COS|       |     | IA32_L2_QOS_MASK_0         |     0xD10      |
> +   +----+---+-------+     +----------------------------+----------------+
> +          └-------------> | ...                        |  ...           |
> +                          +----------------------------+----------------+
> +                          | IA32_L2_QOS_MASK_n         | 0xD10+n (n<64) |
> +                          +----------------------------+----------------+
> +  ```
> +
> +  L3 CAT/CDP uses a range of MSRs from 0xC90 ~ 0xC90+n (n<128).
> +
> +  L2 CAT uses a range of MSRs from 0xD10 ~ 0xD10+n (n<64), following the L3
> +  CAT/CDP MSRs, setting different L2 cache accessing patterns from L3 cache 
> is
> +  supported.
> +
> +  Every MSR stores a CBM value. A capacity bitmask (CBM) provides a hint to 
> the
> +  hardware indicating the cache space an application should be limited to as

s/application/VM/ ?

> +  well as providing an indication of overlap and isolation in the CAT-capable
> +  cache from other applications contending for the cache.

s/application/VM/ ?

Perhaps 'domain' as you use that later in the document?


> +
> +  Sample cache capacity bitmasks for a bitlength of 8 are shown below. Please
> +  note that all (and only) contiguous '1' combinations are allowed (e.g. 
> FFFFH,
> +  0FF0H, 003CH, etc.).
> +
> +  ```
> +       +----+----+----+----+----+----+----+----+
> +       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
> +       +----+----+----+----+----+----+----+----+
> +  COS0 | A  | A  | A  | A  | A  | A  | A  | A  | Default Bitmask
> +       +----+----+----+----+----+----+----+----+
> +  COS1 | A  | A  | A  | A  | A  | A  | A  | A  |
> +       +----+----+----+----+----+----+----+----+
> +  COS2 | A  | A  | A  | A  | A  | A  | A  | A  |
> +       +----+----+----+----+----+----+----+----+
> +
> +       +----+----+----+----+----+----+----+----+
> +       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
> +       +----+----+----+----+----+----+----+----+
> +  COS0 | A  | A  | A  | A  | A  | A  | A  | A  | Overlapped Bitmask
> +       +----+----+----+----+----+----+----+----+
> +  COS1 |    |    |    |    | A  | A  | A  | A  |
> +       +----+----+----+----+----+----+----+----+
> +  COS2 |    |    |    |    |    |    | A  | A  |
> +       +----+----+----+----+----+----+----+----+
> +
> +       +----+----+----+----+----+----+----+----+
> +       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
> +       +----+----+----+----+----+----+----+----+
> +  COS0 | A  | A  | A  | A  |    |    |    |    | Isolated Bitmask
> +       +----+----+----+----+----+----+----+----+
> +  COS1 |    |    |    |    | A  | A  |    |    |
> +       +----+----+----+----+----+----+----+----+
> +  COS2 |    |    |    |    |    |    | A  | A  |
> +       +----+----+----+----+----+----+----+----+
> +  ```
> +
> +  We can get the CBM length through CPUID. The default value of CBM is 
> calcul-
> +  ated by `(1ull << cbm_len) - 1`. That is a fully open bitmask, all ones 
> bitm-
> +  ask. The COS[0] always stores the default value without change.
> +
> +  There is a `IA32_PQR_ASSOC` register which stores the COS ID of the VCPU. 
> HW
> +  enforces cache allocation according to the corresponding CBM.
> +
> +## The relationship between L3 CAT/CDP and L2 CAT
> +
> +  HW may support all features. By default, CDP is disabled on the processor.
> +  If the L3 CAT MSRs are used without enabling CDP, the processor operates in
> +  a traditional CAT-only mode. When CDP is enabled,

s/,/:/
> +  * the CAT mask MSRs are re-mapped into interleaved pairs of mask MSRs for
> +    data or code fetches.
> +  * the range of COS for CAT is re-indexed, with the lower-half of the COS
> +    range available for CDP.
> +
> +  L2 CAT is independent of L3 CAT/CDP, which means L2 CAT can be enabled 
> while
> +  L3 CAT/CDP is disabled, or L2 CAT and L3 CAT/CDP are both enabled.
> +
> +  As a requirement, the bits of CBM of CAT/CDP must be continuous.
> +
> +  N.B. L2 CAT and L3 CAT/CDP share the same COS field in the same associate
> +  register `IA32_PQR_ASSOC`, which means one COS is associated with a pair of
> +  L2 CAT CBM and L3 CAT/CDP CBM.
> +
> +  Besides, the max COS of L2 CAT may be different from L3 CAT/CDP (or other
> +  PSR features in future). In some cases, a VM is permitted to have a COS

I noticed you say 'domain' later on in the document. Would it make
sense to replace s/VM/domain/ to be same in this design?

> +  that is beyond one (or more) of PSR features but within the others. For
> +  instance, let's assume the max COS of L2 CAT is 8 but the max COS of L3
> +  CAT is 16, when a VM is assigned 9 as COS, the L3 CAT CBM associated to
> +  COS 9 would be enforced, but for L2 CAT, the HW works as default value is
> +  set since COS 9 is beyond the max COS (8) of L2 CAT.
> +
> +## Design Overview
> +
> +* Core COS/CBM association
> +
> +  When enforcing CAT/CDP, all cores of domains have the same default COS 
> (COS0)
> +  which is associated with the fully open CBM (all ones bitmask) to access 
> all
> +  cache. The default COS is used only in hypervisor and is transparent to 
> tool
> +  stack and user.
> +
> +  System administrator can change PSR allocation policy at runtime by tool 
> stack.
> +  Since L2 CAT shares COS with L3 CAT/CDP, a COS corresponds to a 2-tuple, 
> like
> +  [L2 CBM, L3 CBM] with only-CAT enabled, when CDP is enabled, one COS 
> correspo-
> +  nds to a 3-tuple, like [L2 CBM, L3 Code_CBM, L3 Data_CBM]. If neither L3 
> CAT
> +  nor L3 CDP is enabled, things would be easier, one COS corresponds to one 
> L2
> +  CBM.
> +
> +* VCPU schedule
> +
> +  When context switch happens, the COS of VCPU is written to per-thread MSR
> +  `IA32_PQR_ASSOC`, and then hardware enforces cache allocation according to
> +  the corresponding CBM.
> +
> +* Multi-sockets
> +
> +  Different sockets may have different CAT/CDP capability (e.g. max COS) 
> alth-
> +  ough it is consistent on the same socket. So the capability of per-socket 
> CAT/
> +  CDP is specified.
> +
> +  'psr-cat-set' can set CBM for one domain per socket. On each socket, we 
> main-
> +  tain a COS array for all domains. One domain uses one COS at one time. One 
> COS
> +  stores the CBM of the domain to work. So, when a VCPU of the domain is 
> migrat-
> +  ed from socket 1 to socket 2, it follows configuration on socket 2.
> +
> +  E.g. user sets domain 1 CBM on socket 1 to 0x7f which uses COS 9 but sets 
> do-
> +  main 1 CBM on socket 2 to 0x3f which uses COS 7. When VCPU of this domain
> +  is migrated from socket 1 to 2, the COS ID used is 7, that means 0x3f is 
> the
> +  CBM to work for this domain 1 now.
> +
> +## Implementation Description
> +
> +* Hypervisor interfaces:
> +
> +  1. Boot line parameter "psr=cat" enables L2 CAT and L3 CAT if hardware 
> suppo-
> +     rted. "psr=cdp" enables CDP if hardware supported.
> +
> +  2. SYSCTL:
> +          - XEN_SYSCTL_PSR_CAT_get_l3_info: Get L3 CAT/CDP information.
> +          - XEN_SYSCTL_PSR_CAT_get_l2_info: Get L2 CAT information.
> +
> +  3. DOMCTL:
> +          - XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM: Get L3 CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM: Set L3 CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE: Get CDP Code CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE: Set CDP Code CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA: Get CDP Data CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA: Set CDP Data CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM: Get L2 CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM: Set L2 CBM for a domain.
> +
> +* xl interfaces:
> +
> +  1. psr-cat-show -lX domain-id
> +          Show LX cbm for a domain.
> +          => XEN_SYSCTL_PSR_CAT_get_l3_info    /
> +             XEN_SYSCTL_PSR_CAT_get_l2_info    /
> +             XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM  /
> +             XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE /
> +             XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA /
> +             XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM
> +
> +  2. psr-cat-set -lX domain-id cbm
> +          Set LX cbm for a domain.
> +          => XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM  /
> +             XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE /
> +             XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA /
> +             XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM
> +
> +  3. psr-hwinfo
> +          Show PSR HW information, including L3 CAT/CDP/L2 CAT
> +          => XEN_SYSCTL_PSR_CAT_get_l3_info /
> +             XEN_SYSCTL_PSR_CAT_get_l2_info
> +
> +* Key data structure:
> +
> +   1. Feature HW info
> +
> +      ```
> +      struct psr_cat_hw_info {
> +          unsigned int cbm_len;
> +          unsigned int cos_max;
> +      };
> +      ```
> +
> +      - Member `cbm_len`
> +
> +        `cbm_len` is one of the hardware info of CAT. It means the max number
> +        of bits to set.
> +
> +      - Member `cos_max`
> +
> +        `cos_max` is one of the hardware info of CAT. It means the max number
> +        of COS registers.
> +
> +   2. Feature list node
> +
> +      ```
> +      struct feat_node {
> +          enum psr_feat_type feature;
> +          struct feat_ops ops;
> +          struct psr_cat_hw_info info;
> +          uint64_t cos_reg_val[MAX_COS_REG_NUM];
> +          struct list_head list;
> +      };
> +      ```
> +
> +      When a PSR enforcement feature is enabled, it will be added into a
> +      feature list. The head of the list is created in psr initialization.
> +
> +      - Member `feature`
> +
> +        `feature` is an integer number, to indicate which feature the list 
> entry
> +        corresponds to.
> +
> +      - Member `ops`
> +
> +        `ops` maintains a callback function list of the feature. It will be 
> introduced
> +        in details later at `4. Feature operation functions structure`.

I think you can just do:

[Feature operation functions structure]

And when you run `pandoc -toc -o intel_psr_cat_cdp.pdf
intel_psr_cat_cdp.pandoc`

it will provide the right link (which you can follow) to the proper
section.
> +
> +      - Member `info`
> +
> +        `info` maintains the feature HW information which are provided to 
> psr_hwinfo
> +        command.
> +
> +      - Member `cos_reg_val`
> +
> +        `cos_reg_val` is an array to maintain the value set in all COS 
> registers of
> +        the feature. The array is indexed by COS ID.
> +
> +   3. Per-socket PSR features information structure
> +
> +      ```
> +      struct psr_socket_info {
> +          unsigned int feat_mask;
> +          unsigned int nr_feat;
> +          struct list_head feat_list;
> +          unsigned int cos_ref[MAX_COS_REG_NUM];
> +          spinlock_t ref_lock;
> +      };
> +      ```
> +
> +      We collect all PSR allocation features information of a socket in this
> +      `struct psr_socket_info`.
> +
> +      - Member `feat_mask`
> +
> +        `feat_mask` is a bitmap, to indicate which feature is enabled on 
> current
> +        socket. We define `feat_mask` bitmap as:
> +
> +        bit 0: L3 CAT status.
> +        bit 1: L3 CDP status.
> +        bit 2: L2 CAT status.

Just in case if you change the code and there are more bit positions - I
would recommend you replace the 'We define 'feat_mask' bitmap as:
.. bit 0 .."

with:

"See values defined in 'enum psr_feat_type'"

As that will make it easier in case the code is changed but the doc
becomes out-dated.
> +
> +      - Member `nr_feat`
> +
> +        `nr_feat` means the number of PSR features enabled.
> +
> +      - Member `cos_ref`
> +
> +        `cos_ref` is an array which maintains the reference of one COS. It 
> maps
> +        to cos_reg_val[MAX_COS_REG_NUM] in `struct feat_node`. If one COS is
> +        used by one domain, the corresponding reference will increase by 
> one. If
> +        a domain releases the COS, the reference will decrease by one. The 
> array
> +        is indexed by COS ID.
> +
> +   4. Feature operation functions structure
> +
> +      ```
> +      struct feat_ops {
> +          unsigned int (*get_cos_max)(const struct feat_node *feat);
> +          int (*get_feat_info)(const struct feat_node *feat,
> +                               uint32_t data[], uint32_t array_len);
> +          int (*get_val)(const struct feat_node *feat, unsigned int cos,
> +                         enum cbm_type type, uint64_t *val);
> +          unsigned int (*get_cos_num)(const struct feat_node *feat);
> +          int (*get_old_val)(uint64_t val[],
> +                             const struct feat_node *feat,
> +                             unsigned int old_cos);
> +          int (*set_new_val)(uint64_t val[],
> +                             const struct feat_node *feat,
> +                             unsigned int old_cos,
> +                             enum cbm_type type,
> +                             uint64_t m);
> +          int (*compare_val)(const uint64_t val[], const struct feat_node 
> *feat,
> +                             unsigned int cos, bool *found);
> +          unsigned int (*fits_cos_max)(const uint64_t val[],
> +                                       const struct feat_node *feat,
> +                                       unsigned int cos);
> +          int (*write_msr)(unsigned int cos, const uint64_t val[],
> +                           struct feat_node *feat);
> +      };
> +      ```
> +
> +      We abstract above callback functions to encapsulate the feature 
> specific
> +      behaviors into them. Then, it is easy to add a new feature. We just 
> need:
> +          1) Implement such ops and callback functions for every feature.
> +          2) Register the ops into `struct feat_node`.
> +          3) Add the feature into feature list during CPU initialization.
> +
> +# Limitations
> +
> +CAT/CDP can only work on HW which enables it(check by CPUID). So far, there 
> is
> +no HW which enables both L2 CAT and L3 CAT/CDP. But SW implementation has 
> cons-
> +idered such scenario to enable both L2 CAT and L3 CAT/CDP.
> +
> +# Testing
> +
> +We can execute above xl commands to verify L2 CAT and L3 CAT/CDP on different
> +HWs support them.
> +
> +For example:
> +    root@:~$ xl psr-hwinfo --cat
> +    Cache Allocation Technology (CAT): L2
> +    Socket ID       : 0
> +    Maximum COS     : 3
> +    CBM length      : 8
> +    Default CBM     : 0xff
> +
> +    root@:~$ xl psr-cat-cbm-set -l2 1 0x7f
> +
> +    root@:~$ xl psr-cat-show -l2 1
> +    Socket ID       : 0
> +    Default CBM     : 0xff
> +       ID                     NAME             CBM
> +        1                 ubuntu14            0x7f
> +
> +# Areas for improvement
> +
> +N/A

I would say that using '0x7f' is not very user-friendly. It really
would be good if that changed to something easier to grok.

For example if I am system admin and I look at:

       +----+----+----+----+----+----+----+----+
       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
       +----+----+----+----+----+----+----+----+
  COS0 | A  | A  | A  | A  |    |    |    |    | Isolated Bitmask
       +----+----+----+----+----+----+----+----+
  COS1 |    |    |    |    | A  | A  |    |    |
       +----+----+----+----+----+----+----+----+
  COS2 |    |    |    |    |    |    | A  | A  |
       +----+----+----+----+----+----+----+----+

I would think that giving an guest 'M7->M4' means it has more
cache than M3->M2 or M1->M0.

But that is not spelled in details. Or what happens if I do:
       +----+----+----+----+----+----+----+----+
       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
       +----+----+----+----+----+----+----+----+
  COS0 |    |    |    |    |    | A  |    |    | Isolated Bitmask
       +----+----+----+----+----+----+----+----+
  COS1 |    |    |    |    |    |    | A  |    |
       +----+----+----+----+----+----+----+----+
  COS2 |    |    |    |    |    |    |    | A  |
       +----+----+----+----+----+----+----+----+

Does that have the same effect as the previous one?
I would think not, but perhaps it is the same (we set
three 'pools').

And does this mean that I've made a grave error
and M7->M3 are in effect only available to the hypervisor (and
dom0?, but only if dom0 is PV, but not for PVH dom0?)

Thanks!
> +
> +# Known issues
> +
> +N/A
> +
> +# References
> +
> +"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" 
> [Intel® 64 and IA-32 Architectures Software Developer Manuals, 
> vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
> +
> +# History
> +
> +------------------------------------------------------------------------
> +Date       Revision Version  Notes
> +---------- -------- -------- -------------------------------------------
> +2016-08-12 1.0      Xen 4.9  Design document written

Perhaps update that a bit? I think we are at 1.6 ?
> +---------- -------- -------- -------------------------------------------
> -- 
> 1.9.1
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.