[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [XenRT] Cache-Aware Real-Time Xen: Partition shared cache for guest domains in Xen via page coloring

2015-05-12 18:59 GMT-04:00 Dario Faggioli <dario.faggioli@xxxxxxxxxx>:
> On Sun, 2015-05-10 at 22:36 -0400, Meng Xu wrote:
>> Hi Dario and George,
> Hi Meng,

Hi Dario,

> I gave a quick look at the slides. Nice work.

Thanks for your encouragement! :-)

> Although I don't have much time, I also wanted to take a quick glance at
> the code, and looked it up on GitHub, where you usually host your stuff,
> but couldn't find it (maybe because I'm really ignorant about how that
> site works! :-P).. Is it hosted somewhere public already?

The code is not hosted public right now.
One reason is because it is  really "ugly" right now (I directly
copied some functions, change the name and interface to make it work.
Of course, I can rework the code and make it acceptable to the Xen
coding standard.). :-(
Another reason is because I'm trying to publish/submit some paper on
this, which is what I needed for graduation, before making all code
totally public. :-P
But I can add you and the people who are interested, into my private
repository in github. Is that ok?

>> Right now, I almost finish the first two steps and have some
>> preliminary results of the real-time performance of Xen with static
>> cache partition mechanism. I made a quick slide to summarize the
>> current work and the future plan.
>> The slide can be found at:
>> http://www.cis.upenn.edu/~mengxu/cart-xen/2015-05-01-CARTXen-WiP.pdf
> The results look nice and promising, at least for real-time
> (virtualization) workloads. I'm quite sure folks working on
> embedded/automotive projects, that are looking at RTDS, would find them
> very interesting (provided it works similarly well on ARM, as that's
> what they use), and in fact I'm adding people from GlobalLogic to the Cc
> list.
>> My question is:
>> Do you have any comment or concerns on the current software-based
>> cache management work?

Yes! :-)
The concerns/shortcoming of this software based cache management is:
(1) It is limited to the page size equal to 4KB. As Andrea/Jan pointed
out before, when the page size become 2MB, we won't be able to
partition the cache with this approach. We need to manage the memory
in finer granularity so that we can control some machine address bits
to control which area of cache should be used. Actually, I haven't
looked into the HVM case which uses the superpage yet, so I cannot say
it is impossible, but it will be harder for sure.
(2) If we want to migrate one cache partition of a domain to another
cache partition, we will have to copy the related pages, which is
expensive in my opinion. Of course, we can do some lazy-copying to
only copy the page when it is used, but in the worst case, we still
need to copy those dirty pages.  This means dynamic cache management
will have relatively high overhead. Actually, this is what I'm working
on right now and trying to measure how large the overhead is.
(3) Potential TLB miss issues. This is not as large as we first
thought. My speculation is that TLB is a special cache which may also
have the prefetch. The prefetch may help reduce the TLB overhead.
That's why we didn't see much TLB overhead in the evaluation.

I'm actually interested in comparing this software-based approach with
hardware-based approach, like CAT introduced by Intel. But I don't
have the hardware (which is so expensive :-() right now.

> My first thought is the one I sort of expressed above already: do you
> think it could work on ARM as well? I'm quite sure they have caches, so
> the basic idea is applicable there too, but I know too few of that
> architecture to see how well/bad it will behave in there! Would you have
> the chance and the interest in trying to find that out?

I think the implementation works on ARM as well because the code I
touch does not really depend on the arch. (Except that I have to
assume page size is 4KB.)
The only concern is that the latest ARM core I know only has 2MB share
cache, which is relatively small.

Actually, Hyon-Young (who is a visiting professor of Insup) and I had
a look at the ARM board last year. We tried on the Samsung Exynos 5420
Arndale Octa Board and faced the signature issue (the U-boot cannot
bring up the CPU in hypervisor mode:
Then we tried the cubieboard which seems working. But because I was
busy with some other things and didn't continue on this direction last
year. :-(
If we can have some good case study on the ARM board, I think it will
be interesting and we can continue working on it. :-)

> Another question: what are (if any) the limitations and the restrictions
> we have to accept, in order to be able to take advantage of this? E.g.,
> I remember someone (I think it was Andrew) mentioning that playing the
> tricks you play with addresses, would make it hard/impossible to use
> superpages.

As I said, superpage is an issue as far as I know right now. We have
to control some bits that is used to index cache to control which area
of cache should be used. I haven't looked into superpage yet, but I
will have a look and think it must be an interesting problem to work

 I also remember that you were having problems at finding
> large enough chunks of contiguous memory... Are these still open issues?

Ah, that is solved. :-) I guess it's because I add a field in
page_info structure and blow the page_info struct too much. Then the
memory to hold all page_info is too large and eat up some memory that
is reserved for DMA. (This is my guess because after I make the added
field smaller in the page_info structure, it works. :-P Please correct
me if I'm wrong.)

>> I  hope to listen to your opinions and  incorporate your opinions on
>> my ongoing work instead of diverting too
>> far away from Xen mainstream ideas. :-)
> Some more thoughts. This looks like something that could work quite well
> in environment and use cases that:
>  1. are rather static (i.e., no or few dynamic domain creation, well
>     defined workload inside each domain, etc.)
>  2. there are not too many domains around. I mean, if you have hundreds
>     of guests, it's very unlikely that you'll be able to arrange for a
>     similar number of properly sized partitions, isn't it?
> That is actually why I really think this could be useful for the
> embedded virt people: this is exactly how their environment looks
> like! :-)

Do you or anyone else have some practical use case that we can run on
RTDS? That should be an interesting case study.

> I think we should definitely consider merging something that will
> potentially help the emerging embedded/automotive use cases, provided
> (as usual):
>  1. the benefits are real and really useful (e.g., they are still there
>     on ARM)

I think it works on ARM as long as we manage the page in 4KB on ARM.
As to usefulness, I'm willing to make some practical applications
running on the RTDS scheduler and the cache partition mechanism. This
will make the benefit real.

>  2. it does not disrupt other workloads, it does not impact other
>     features and it does not make the code worse (i.e., more difficult
>     to understand and to maintain)

As to the static cache partition, I think it should not impact other
features. It is more like adding new hypercalls and functions and
redirect the existing memory allocation/free functions to the newly
added functions if we configure to use this feature.

> I'd suggest, if you agree with my analysis, you try to assess 1.

As to try to assess 1, do you mean that I should try to run the code
on some ARM board?

> .. Maybe
> GlobalLogic people could help, if they're interested.
I really appreciate it if they could help! :-P
Actually, we also want to see the real applications running on Xen
with RT features.

Thank you very much for your comments and advice!

Best regards,


Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.