[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 01/15] docs: create Memory Bandwidth Allocation (MBA) feature document



On Sat, Sep 23, 2017 at 09:48:10AM +0000, Yi Sun wrote:
> This patch creates MBA feature document in doc/features/. It describes
> key points to implement MBA which is described in details in Intel SDM
                                                    ^ detail
> "Introduction to Memory Bandwidth Allocation".
> 
> Signed-off-by: Yi Sun <yi.y.sun@xxxxxxxxxxxxxxx>

Thanks, I think this is looking quite good IMHO. Just a couple of nits
below.

> ---
> CC: Jan Beulich <jbeulich@xxxxxxxx>
> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> CC: Wei Liu <wei.liu2@xxxxxxxxxx>
> CC: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> CC: Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>
> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> CC: Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx>
> CC: Julien Grall <julien.grall@xxxxxxx>
> 
> v4:
>     - add 'domain-name' as parameter of 'psr-mba-show/psr-mba-set'.
>       (suggested by Roger Pau Monné)
>     - fix some wordings.
>       (suggested by Roger Pau Monné)
>     - explain how user can know the MBA_MAX.
>       (suggested by Roger Pau Monné)
>     - move the description of 'Linear mode/Non-linear mode' into section
>       of 'psr-mba-show'.
>       (suggested by Roger Pau Monné)
>     - change 'per-thread' to 'per-hyper-thread' to make it clearer.
>       (suggested by Roger Pau Monné)
>     - upgrade revision number.
> v3:
>     - remove 'closed-loop' related description.
>       (suggested by Roger Pau Monné)
>     - explain 'linear' and 'non-linear' before mentioning them.
>       (suggested by Roger Pau Monné)
>     - adjust desription of 'psr-mba-set'.
>       (suggested by Roger Pau Monné)
>     - explain 'MBA_MAX'.
>       (suggested by Roger Pau Monné)
>     - remove 'n<64'.
>       (suggested by Roger Pau Monné)
>     - fix some wordings.
>       (suggested by Roger Pau Monné)
>     - add context in 'Testing' part to make things more clear.
>       (suggested by Roger Pau Monné)
> v2:
>     - declare 'HW' in Terminology.
>       (suggested by Chao Peng)
>     - replace 'COS ID of VCPU' to 'COS ID of domain'.
>       (suggested by Chao Peng)
>     - replace 'COS register' to 'Thrtl MSR'.
>       (suggested by Chao Peng)
>     - add description for 'psr-mba-show' to state that the decimal value is
>       shown for linear mode but hexadecimal value is shown for non-linear 
> mode.
>       (suggested by Chao Peng)
>     - remove content in 'Areas for improvement'.
>       (suggested by Chao Peng)
>     - use '<>' to specify mandatory argument to a command.
>       (suggested by Wei Liu)
> v1:
>     - remove a special character to avoid the error when building pandoc.
> ---
>  docs/features/intel_psr_mba.pandoc | 291 
> +++++++++++++++++++++++++++++++++++++
>  1 file changed, 291 insertions(+)
>  create mode 100644 docs/features/intel_psr_mba.pandoc
> 
> diff --git a/docs/features/intel_psr_mba.pandoc 
> b/docs/features/intel_psr_mba.pandoc
> new file mode 100644
> index 0000000..7a6a588
> --- /dev/null
> +++ b/docs/features/intel_psr_mba.pandoc
> @@ -0,0 +1,291 @@
> +% Intel Memory Bandwidth Allocation (MBA) Feature
> +% Revision 1.6
> +
> +\clearpage
> +
> +# Basics
> +
> +---------------- ----------------------------------------------------
> +         Status: **Tech Preview**
> +
> +Architecture(s): Intel x86
> +
> +   Component(s): Hypervisor, toolstack
> +
> +       Hardware: MBA is supported on Skylake Server and beyond
> +---------------- ----------------------------------------------------
> +
> +# Terminology
> +
> +* CAT         Cache Allocation Technology
> +* CBM         Capacity BitMasks
> +* CDP         Code and Data Prioritization
> +* COS/CLOS    Class of Service
> +* HW          Hardware
> +* MBA         Memory Bandwidth Allocation
> +* MSRs        Machine Specific Registers
> +* PSR         Intel Platform Shared Resource
> +* THRTL       Throttle value or delay value
> +
> +# Overview
> +
> +The Memory Bandwidth Allocation (MBA) feature provides indirect and 
> approximate
> +control over memory bandwidth available per-core. This feature provides OS/
> +hypervisor the ability to slow misbehaving apps/domains by using a 
> credit-based
> +throttling mechanism.
> +
> +# User details
> +
> +* Feature Enabling:
> +
> +  Add "psr=mba" to boot line parameter to enable MBA feature.
> +
> +* xl interfaces:
> +
> +  1. `psr-mba-show [domain-id|domain-name]`:
> +
> +     Show memory bandwidth throttling for domain. Under different modes, it
> +     shows different type of data.
> +
> +     There are two modes:
> +     Linear mode: the input precision is defined as 100-(MBA_MAX). For 
> instance,
> +     if the MBA_MAX value is 90, the input precision is 10%. Values not an 
> even
> +     multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10%
> +     delay applied) by HW automatically. The response of throttling value is
> +     linear.
> +
> +     Non-linear mode: input delay values are powers-of-two from zero to the
> +     MBA_MAX value from CPUID. In this case any values not a power of two 
> will
> +     be rounded down the next nearest power of two by HW automatically. The
> +     response of throttling value is non-linear.
> +
> +     For linear mode, it shows the decimal value. For non-linear mode, it 
> shows
> +     hexadecimal value.
> +
> +  2. `psr-mba-set [OPTIONS] <domain-id|domain-name> <throttling>`:
> +
> +     Set memory bandwidth throttling for domain.
> +
> +     Options:
> +     '-s': Specify the socket to process, otherwise all sockets are 
> processed.
> +
> +     Throttling value set in register implies the approximate amount of 
> delaying
> +     the traffic between core and memory. The higher throttling value 
> results in
                                             ^ remove 'The'              ^ 
result
> +     lower bandwidth. The max throttling value (MBA_MAX) supported can be
> +     obtained through CPUID inside hypervisor. User can know it through

"Users can fetch the MBA_MAX value using the `psr-hwinfo` xl command."

> +     `psr-hwinfo`.
> +
> +# Technical details
> +
> +MBA is a member of Intel PSR features, it shares the base PSR infrastructure
> +in Xen.
> +
> +## Hardware perspective
> +
> +  MBA defines a range of MSRs to support specifying a delay value (Thrtl) per
> +  COS, with details below.
> +
> +  ```
> +   +----------------------------+----------------+
> +   | MSR (per socket)           |    Address     |
> +   +----------------------------+----------------+
> +   | IA32_L2_QOS_Ext_BW_Thrtl_0 |     0xD50      |
> +   +----------------------------+----------------+
> +   | ...                        |  ...           |
> +   +----------------------------+----------------+
> +   | IA32_L2_QOS_Ext_BW_Thrtl_n |     0xD50+n    |
> +   +----------------------------+----------------+
> +  ```
> +
> +  When context switch happens, the COS ID of domain is written to per-hyper-
> +  thread MSR `IA32_PQR_ASSOC`, and then hardware enforces bandwidth 
> allocation
> +  according to the throttling value stored in the Thrtl MSR register.
> +
> +## The relationship between MBA and CAT/CDP
> +
> +  Generally speaking, MBA is completely independent of CAT/CDP, and any
> +  combination may be applied at any time, e.g. enabling MBA with CAT
> +  disabled.
> +
> +  But it needs to be noticed that MBA shares COS infrastructure with CAT,
> +  although MBA is enumerated by different CPUID leaf from CAT (which
> +  indicates that the max COS of MBA may be different from CAT). In some
> +  cases, a domain is permitted to have a COS that is beyond one (or more)
> +  of PSR features but within the others. For instance, let's assume the max
> +  COS of MBA is 8 but the max COS of L3 CAT is 16, when a domain is assigned
> +  9 as COS, the L3 CAT CBM associated to COS 9 would be enforced, but for 
> MBA,
> +  the HW works as default value is set since COS 9 is beyond the max COS (8)
> +  of MBA.
> +
> +## Design Overview
> +
> +* Core COS/Thrtl association
> +
> +  When enforcing Memory Bandwidth Allocation, all cores of domains have
> +  the same default Thrtl MSR (COS0) which stores the same Thrtl (0). The
> +  default Thrtl MSR is used only in hypervisor and is transparent to tool 
> stack
> +  and user.
> +
> +  System administrators can change PSR allocation policy at runtime by
> +  using the tool stack. Since MBA shares COS ID with CAT/CDP, a COS ID
> +  corresponds to a 2-tuple, like [CBM, Thrtl] with only-CAT enabled, when CDP
> +  is enabled, the COS ID corresponds to a 3-tuple, like [Code_CBM, Data_CBM,
> +  Thrtl]. If neither CAT nor CDP is enabled, things are easier, since one COS
> +  ID corresponds to one Thrtl.

I find the above paragraph a little bit difficult to parse, although
I'm not going to force you to re-write it.

> +
> +* VCPU schedule
> +
> +  This part reuses CAT COS infrastructure.
> +
> +* Multi-sockets
> +
> +  Different sockets may have different MBA ability (like max COS)
                                              ^ capabilities?
[...]
> +# Testing
> +
> +We can execute these commands to verify MBA on different HWs supporting them.
> +
> +For example:
> +  1. User can get the MBA hardware info through 'psr-hwinfo' command. From
> +     result, user can know if this hardware works under linear mode or non-
> +     linear mode, the max throttling value (MBA_MAX) and so on.
> +
> +    root@:~$ xl psr-hwinfo --mba
> +    Memory Bandwidth Allocation (MBA):
> +    Socket ID       : 0
> +    Linear Mode     : Enabled
> +    Maximum COS     : 7
> +    Maximum Throttling Value: 90
> +    Default Throttling Value: 0
> +
> +  2. Then, user can set a throttling value to a domain. For example, set 
> '0xa',
> +     i.e 10% delay.
> +
> +    root@:~$ xl psr-mba-set 1 0xa

I would write this as 10 instead of 0xa, ie:

$ xl psr-mba-set 1 10

I think it's clearer because MBA is in linear mode, so the values
returned from xl will be in decimal base rather than hexadecimal.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.