[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Scalable Event Channel ABI design (draft A)



On 04/02/13 21:07, Wei Liu wrote:
> 
> Where is the "priority" information stored? I think it should be
> somewhere inside Xen, right? I presume a field in struct evtchn?

I think so. I've not really thought too much about the internal design.

>> ### `EVTCHNOP_init`
>>
>> This call initializes a single VCPU's event channel data structures,
>> adding one page for the event array.
>>
> 
> I think the registration for new event channels should always be done in
> a batch. If not then you need to provide another OP to rollback previous
> registered ones.

Hmmm. That's an interesting point.  I'd be inclined to have the guest
take the VCPU offline if it cannot initialize it fully.

>> Each page of the event array has space for 1024 events ($E_P$) so a
>> regular domU will only require a single page.  Since event channels
>> typically come in pairs, the upper bound on the total number of pages
>                                ^^^^^
>                                upper or lower?

I meant upper, but I am playing fast-and-loose with the maths here since
I was aiming for an estimate rather than a real upper bound.

>> is $2 \times\text{number of VMs}$.
>>
>> If the guests are further restricted in the number of event channels
>> ($E_V$) then this upper bound can be reduced further.
>>
> 
> Can this bound really be reduced? Can you map memory on non-page
> granularity?

The reasoning here is that event channels are in pairs (or rather they
have two ends).  One event is the domU, the other in dom0.  The events
in dom0 are packed into pages, whereas the domU events use 1 page no
matter how few events there are.

I wasn't entirely happy with this way of doing the estimate which is why
I did the second method, which gave a similar figure.

>> The number of VMs ($V$) with a limit of $P$ total event array pages is:
>> \begin{equation*}
>> V = P \div \left(1 + \frac{E_V}{E_P}\right)
                    ^         ^^^^^^^^^
The page in domU             The fraction of a page in dom0.


>> Raising an Event
>> ----------------
>>
>> When Xen raises an event it marks it pending and (if it is not masked)
>> adds it tail of event queue.
>>
>>     E[p].pending = 1
>>     if not E[p].linked and not E[n].masked
>>         E[p].linked = 1
>>         E[p].link = 0
>>         mb()
>>         if H == 0
>>             H = p
>>         else
>>             E[T].link = p
>>         T = p
>>
>> Concurrent access by Xen to the event queue must be protected by a
>> per-event queue spin lock.
>>
> 
> I presume "E[n]" in the pseudo code is "E[p]"?

Yes.

> Is this spin lock really a good idea? How many threads / cpus will spin
> on this lock? As [0] shows, contention on spin lock incurs heavy
> performance penalty.

In addition to Keir's comment, the spinlock itself won't reside in the
same cache line as the control block or event array so this will reduce
cache line bouncing.

> [0] https://lwn.net/Articles/530458/
> 
>> Consuming Events
>> ----------------
>>
>> The guests consumes events starting at the head until it reaches the
>> tail.  Events in the queue that are not pending or are masked are
>> consumed but not handled.
>>
>>     while H != 0
>>         p = H
>>         H = E[p].link
>>         if H == 0
>>             mb()
>>             H = E[p].link
>>         E[H].linked = 0
>>         if not E[p].masked
>>             handle(p)
>>
>> handle() clears `E[p].pending` and EOIs level-triggered PIRQs.
>>
> 
> How to synchronize access to the array and control blocks between Xen
> and guest? I'm afraid I have no knowledge of a xen-guest spin lock...

It's a lockless data structure on the consumer side (provided there is
only one consumer).

>> Unmasking Events
>> ----------------
>>
>> Events are unmasked by the guest by clearing the masked bit.  If the
>> event is pending the guest must call the event channel unmask
>> hypercall so Xen can link the event into the correct event queue.
>>
>>     E[p].masked = 0
>>     if E[p].pending
>>         hypercall(EVTCHN_unmask)
>>
>> The expectation here is that unmasking a pending event will be rare,
>> so the performance hit of the hypercall is minimal.
>>
> 
> Currently unmask a "remote" port requires issuing hyercall as well, so
> if unmasking is not very frequent, this is not a big problem.
> 
> But please take some interrupt-intensive workloads into consideration.
> For example, 1G nic (e1000) even with NAPI enabled can generate 27k+
> intrs per second under high packet load [1], 10G nic can surely generate
> more. Can you give estimation on the performance hit on the context
> switch?

I'm not sure how I would give an estimate, this is something that would
need to be measured I think.

Also, whilst the upcall itself might be reentrant, the processing of
each queue cannot be so the mask/unmask done by the irq_chip callbacks
isn't needed.  mask/unmask is then only needed occasionally (e.g., for
irq migration) and thus isn't so performance critical.

>>> Note: that after clearing the mask bit, the event may be raised and
>>> thus it may already be linked by the time the hypercall is done.
> 
> Even if the event has already been linked before you finish the
> hypercall, you would still need to get hold of the a lock to serialize
> access to event structure for checking, right? Or a test_bit on linked
> field is sufficient? I think you need to write some pseudo code for this
> as well.

Xen will need to take the per-queue lock for this, yes.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.