Hi, Dario:
Thanks for starting this topic. I have limited experience
with industry, so I'll provide some input from the academia
side. Please correct me if I am wrong.
1. A real-time CPU scheduler would be great. That's
actually the motivation that we started the RT-Xen project.
The scheduling in a virtualized environments maps to a
two-level scheduling hierarchy in real-time systems. We can
use the hierarchical scheduling theories to provide formal
analysis for it. One key assumption of these theories is a
formally defined 'server' to represent the VCPUs. We
implemented and compared different servers in RT-Xen. and
published at:
S. Xi, J. Wilson, C. Lu and C.D. Gill, RT-Xen: Towards
Real-time Hypervisor Scheduling in Xen, ACM International
Conference on Embedded Software (EMSOFT'11), October 2011.
J. Lee, S. Xi, S. Chen, L.T.X. Phan, C. Gill, I. Lee, C. Lu
and O. Sokolsky, Realizing Compositional Scheduling through
Virtualization, IEEE Real-Time and Embedded Technology and
Applications Symposium (RTAS'12), April 2012.
2. An appropriate cache management scheme would be great.
Current CPU architecture have both dedicated cache (usually
L1 and L2) and shared cache (L3).
a) For the dedicated cache, existing credit1 use
partitioned scheduling with load-balancing; while credit2 use
modified global scheduling with migration
resistant/compensation. I think if the user runs
cache-sensitive application, partitioned scheduler seems to be
a better choice.
b) For the shared cache, the 'noisy neighbor' problem where
one guest OS just runs a cache-busy application and everybody
hurts can happen. I have seen several papers try to solve it,
but don't know whether they will be integrated into Xen or
not.
<1> If there are multiple LLC, each shared by a
subset of PCPUs, a dynamic cluster scheme is proposed in this
paper:
Min Lee, Karsten Schwan. "Region Scheduling: Efficiently Using
the Cache Architectures via Page-level Affinity." ASPLOS 2012,
London, UK, March 3-7, 2012.
<2> If there is one large shared LLC, cache partition by
domain seems a solution. These two papers have explored it:
3. An deterministic network latency through Domain-0 would
be great.
Currently Xen does not support packet prioritization. Users
can achieve similar function by using the Linux Traffic
Control Tool in Domain-0, but priority-inversion can still
happen.
We did some work on prioritizing inter-domain communication
on Xen, and published at:
S. Xi, C. Li, C. Lu and C. Gill, Prioritizing Local Inter-Domain
Communication in Xen, ACM/IEEE International Symposium on
Quality of Service (IWQoS'13), June 2013.
We are working on the actual network traffic through NIC
now.
Thanks and I'd love to hear any
feedback/comments/suggestions on RT-Xen.
Sisu