[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] Extend the number of event channels availabe to guests



On 20/09/12 08:47, Jan Beulich wrote:
On 20.09.12 at 01:49, Attilio Rao<attilio.rao@xxxxxxxxxx>  wrote:
Proposal
The proposal is pretty simple: the eventchannel search will become a
three-level lookup table, with the leaf level being composed by shared
pages registered at boot time by the guests.
The bitmap working now as leaf (then called "second level") will work
alternatively as leaf level still (for older kernel) or for intermediate
level to address into a new array of shared pages (for newer kernels).
This leaves the possibility to reuse the existing mechanisms without
modifying its internals.
While adding one level would seem to leave ample room, so did
the originally 4096 originally. Therefore, even if unimplemented
right now, I'd like the interface to allow for the guest to specify
more levels.

There is a big difference here. The third/new level will be composed of pages registered at guest installing so it can be expanded on demanded necessity. The second-level we have now doesn't work because it is stuck in the immutable ABI. The only useful way to have another level would be in the case we think the second-level is not enough to address all the necessary bits in the third level in efficient way.

To make you an example, the first level is 64 bits while the second level can address 64 times the first level. The third level, to be on-par with the same ratio of the second level in terms of performance, would be large something like 4 pages. I think we are very far from reaching critical levels.

More specifically, what needs to happen:
- Add new members to struct domain to handle an array of pages (to
contain the actual evtchn bitmaps), a further array of pages (to contain
the evtchn masks) and a control bit to say if it is subjective to the
new mode or not. Initially the arrays will be empty and the control bit
will be OFF.
- At init_platform() time, the guest must allocate the pages to compose
the 2 arrays and invoke a novel hypercall which, at big lines, does the
following:
    * Creates some pages to populate the new arrays in struct domain via
alloc_xenheap_pages()
Why? The guest allocated the pages already. Just have the
hypervisor map them (similar, but without the per-vCPU needs,
to registering an alternative per-vCPU shared page). Whether
it turns out more practical to require the guest to enforce
certain restrictions (like the pages being contiguous and/or
address restricted) is a secondary aspect.

Actually what I propose seems to be what happens infact in the shared page case. Look at what arch_domain_create() and XENMEM_add_to_physmap hypercall do (in the XENMAPSPACE_shared_info case). I think this is the quicker way to get what we want.

    * Recreates the mapping with the gpfn passed from the userland, using
basically guest_physmap_add_page()
This would then be superfluous.

    * Sets the control bit to ON
- Places that need to access to the actual leaf bit (like, for example,
xen_evtchn_do_upcall()) will need to double check the control bit. If it
is OFF they consider the second level as the leaf one, otherwise they
will do a further lookup to get the bit from the new array of pages.
Just like for variable depth page tables - if at all possible, just
make the accesses variable depth, so that all you need to track
on a per-domain basis is the depth of the tree.

I agree.

Of course there are some nits to be decided yet, like, for example:
* How many pages should the new level have? We can start by populating
just one, for example
Just let the guest specify this (and error if the number is too large).

I agree.

* Who should have really the knowledge of how many pages to allocate?
Likely the hypervisor should have a threshhold, but in general we may
want to have a posting mechanism to have the guest ask the hypervisor
before-hand and satisfy its actual request
Same here (this is really the same with the previous item, if you
follow the earlier suggestions).

* How many bits should be indirected in the third-level by every single
bit in the second-level? (that is a really minor factor, but still).
The tree should clearly be uniform (i.e. having a factor of
BITS_PER_LONG per level), just like it is now. For 64-bit guests,
this would mean 256k channels with 3 levels (32k for 32-bit
guests).

One aspect to also consider is migration - will the guest have to
re-issue the extending hypercall, or will this be taken care of for
it? If the former approach is chosen, would the guest be
expected to deal with not being able to set up the extension
again on the new host?

I think this could be also handled with some trickery by switching the control bit off. I need to make an assessment on the races invovled because we are not any longer in the "domain startup" case.

And another important (but implementation only) aspect not to
forget is making domain_dump_evtchn_info() scale with the
then much higher amount of dumping potentially to be done (i.e.
not just extend it to cope with the count, but also make sure it
properly allows softirqs to be handled, which in turn requires to
not hold the event lock across the whole loop).


I still didn't look into it, but thanks for pointing out.

Attilio

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.