[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 01/15] xen/common: add cache coloring common code



Hi Jan,

On Wed, Jan 31, 2024 at 4:57 PM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 29.01.2024 18:17, Carlo Nonato wrote:
> > Last Level Cache (LLC) coloring allows to partition the cache in smaller
> > chunks called cache colors. Since not all architectures can actually
> > implement it, add a HAS_LLC_COLORING Kconfig and put other options under
> > xen/arch.
> >
> > LLC colors are a property of the domain, so the domain struct has to be
> > extended.
> >
> > Based on original work from: Luca Miccio <lucmiccio@xxxxxxxxx>
> >
> > Signed-off-by: Carlo Nonato <carlo.nonato@xxxxxxxxxxxxxxx>
> > Signed-off-by: Marco Solieri <marco.solieri@xxxxxxxxxxxxxxx>
> > ---
> > v6:
> > - moved almost all code in common
> > - moved documentation in this patch
> > - reintroduced range for CONFIG_NR_LLC_COLORS
> > - reintroduced some stub functions to reduce the number of checks on
> >   llc_coloring_enabled
> > - moved domain_llc_coloring_free() in same patch where allocation happens
> > - turned "d->llc_colors" to pointer-to-const
> > - llc_coloring_init() now returns void and panics if errors are found
> > v5:
> > - used - instead of _ for filenames
> > - removed domain_create_llc_colored()
> > - removed stub functions
> > - coloring domain fields are now #ifdef protected
> > v4:
> > - Kconfig options moved to xen/arch
> > - removed range for CONFIG_NR_LLC_COLORS
> > - added "llc_coloring_enabled" global to later implement the boot-time
> >   switch
> > - added domain_create_llc_colored() to be able to pass colors
> > - added is_domain_llc_colored() macro
> > ---
> >  docs/misc/cache-coloring.rst      | 87 +++++++++++++++++++++++++++++++
> >  docs/misc/xen-command-line.pandoc | 27 ++++++++++
> >  xen/arch/Kconfig                  | 17 ++++++
> >  xen/common/Kconfig                |  3 ++
> >  xen/common/Makefile               |  1 +
> >  xen/common/keyhandler.c           |  3 ++
> >  xen/common/llc-coloring.c         | 87 +++++++++++++++++++++++++++++++
> >  xen/include/xen/llc-coloring.h    | 38 ++++++++++++++
> >  xen/include/xen/sched.h           |  5 ++
> >  9 files changed, 268 insertions(+)
> >  create mode 100644 docs/misc/cache-coloring.rst
> >  create mode 100644 xen/common/llc-coloring.c
> >  create mode 100644 xen/include/xen/llc-coloring.h
> >
> > diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
> > new file mode 100644
> > index 0000000000..9fe01e99e1
> > --- /dev/null
> > +++ b/docs/misc/cache-coloring.rst
> > @@ -0,0 +1,87 @@
> > +Xen cache coloring user guide
> > +=============================
> > +
> > +The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
> > +partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is 
> > supported.
> > +
> > +To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
> > +
> > +If needed, change the maximum number of colors with
> > +``CONFIG_NR_LLC_COLORS=<n>``.
> > +
> > +Compile Xen and the toolstack and then configure it via
> > +`Command line parameters`_.
> > +
> > +Background
> > +**********
> > +
> > +Cache hierarchy of a modern multi-core CPU typically has first levels 
> > dedicated
> > +to each core (hence using multiple cache units), while the last level is 
> > shared
> > +among all of them. Such configuration implies that memory operations on one
> > +core (e.g. running a DomU) are able to generate interference on another 
> > core
> > +(e.g .hosting another DomU). Cache coloring allows eliminating this
> > +mutual interference, and thus guaranteeing higher and more predictable
> > +performances for memory accesses.
> > +The key concept underlying cache coloring is a fragmentation of the memory
> > +space into a set of sub-spaces called colors that are mapped to disjoint 
> > cache
> > +partitions. Technically, the whole memory space is first divided into a 
> > number
> > +of subsequent regions. Then each region is in turn divided into a number of
> > +subsequent sub-colors. The generic i-th color is then obtained by all the
> > +i-th sub-colors in each region.
> > +
> > +::
> > +
> > +                            Region j            Region j+1
> > +                .....................   ............
> > +                .                     . .
> > +                .                       .
> > +            _ _ _______________ _ _____________________ _ _
> > +                |     |     |     |     |     |     |
> > +                | c_0 | c_1 |     | c_n | c_0 | c_1 |
> > +           _ _ _|_____|_____|_ _ _|_____|_____|_____|_ _ _
> > +                    :                       :
> > +                    :                       :...         ... .
> > +                    :                            color 0
> > +                    :...........................         ... .
> > +                                                :
> > +          . . ..................................:
> > +
> > +There are two pragmatic lesson to be learnt.
> > +
> > +1. If one wants to avoid cache interference between two domains, different
> > +   colors needs to be used for their memory.
> > +
> > +2. Color assignment must privilege contiguity in the partitioning. E.g.,
> > +   assigning colors (0,1) to domain I  and (2,3) to domain  J is better 
> > than
> > +   assigning colors (0,2) to I and (1,3) to J.
>
> I can't connect this 2nd point with any of what was said above.

If colors are contiguous then a greater spatial locality is achievable. You
mean we should better explain this?

> > +How to compute the number of colors
> > +***********************************
> > +
> > +To compute the number of available colors for a specific platform, the 
> > size of
> > +an LLC way and the page size used by Xen must be known. The first 
> > parameter can
> > +be found in the processor manual or can be also computed dividing the total
> > +cache size by the number of its ways. The second parameter is the minimum
> > +amount of memory that can be mapped by the hypervisor,
>
> I find "amount of memory that can be mapped" quite confusing here. Don't you
> really mean the granularity at which memory can be mapped?

Yes that's what I wanted to describe. I'll change it.

> > thus dividing the way
> > +size by the page size, the number of total cache partitions is found. So 
> > for
> > +example, an Arm Cortex-A53 with a 16-ways associative 1 MiB LLC, can 
> > isolate up
> > +to 16 colors when pages are 4 KiB in size.
>
> I guess it's a matter of what one's use to, but to me talking of "way size"
> and how the calculation is described is, well, unusual. What I would start
> from is the smallest entity, i.e. a cache line. Then it would be relevant
> to describe how, after removing the low so many bits to cover for cache line
> size, the remaining address bits are used to map to a particular set. It
> looks to me as if you're assuming that this mapping is linear, using the
> next so many bits from the address. Afaik this isn't true on various modern
> CPUs; instead hash functions are used. Without knowing at least certain
> properties of such a hash function, I'm afraid your mapping from address to
> color isn't necessarily guaranteeing the promised isolation. The guarantee
> may hold for processors you specifically target, but then I think in this
> description it would help if you would fully spell out any assumptions you
> make on how hardware maps addresses to elements of the cache.

You're right, we are assuming a linear mapping. We are going to review and
extend the documentation in order to fully specify when coloring can be
applied.

About the "way size" it's a way of summarizing all the parameters into one.
We could ask for different cache parameters as you said, but in the end what
we are interested in is how many partitions is the cache capable of isolate
and how big they are. The answer is, in theory, as many partitions as the
number of sets, each one as big as a cache line, bacause we can't have
isolation inside a set.
Then memory mapping comes into place and the minimum granularity at which
mapping can happen actually lowers the number of partitions.
To recap we can isolate:
    nr_sets * line_size / page_size
Then we simply named:
    way_size = nr_sets * line_size
Another way of computing it:
    way_size = cache_size / nr_ways

We are ok with having two parameters: cache_size and nr_ways which are even
easier and intuitive to find for a normal user.

> Or, alternatively, have the number of colors be specifiable directly.

This is to be avoided in my opinion since it's more difficult to compute and
it requires more knowledge.

Thanks.

>
> > +Cache layout is probed automatically by looking at the CLIDR_EL1 arm 
> > register.
> > +This means that other system caches that aren't visible there, are ignored.
> > +The possibility of manually setting the way size is left to the user to 
> > overcome
> > +failing situations or for debugging/testing purposes. See
> > +`Command line parameters`_ for more information on that.
> > +
> > +Command line parameters
> > +***********************
> > +
> > +More specific documentation is available at 
> > `docs/misc/xen-command-line.pandoc`.
> > +
> > ++----------------------+-------------------------------+
> > +| **Parameter**        | **Description**               |
> > ++----------------------+-------------------------------+
> > +| ``llc-coloring``     | enable coloring at runtime    |
> > ++----------------------+-------------------------------+
> > +| ``llc-way-size``     | set the LLC way size          |
> > ++----------------------+-------------------------------+
>
> As a result of the above, I also find it confusing to specify "way size"
> as a command line option. Cache size, number of ways, and cache line size
> would seem more natural to me.
>
> I'll get to looking nat the actual code later.
>
> Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.