[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] grant_table: convert grant table rwlock to percpu rwlock

On 18/11/15 07:45, Jan Beulich wrote:
>>>> On 17.11.15 at 18:53, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 17/11/15 17:39, Jan Beulich wrote:
>>>>>> On 17.11.15 at 18:30, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> On 17/11/15 17:04, Jan Beulich wrote:
>>>>>>>> On 03.11.15 at 18:58, <malcolm.crossley@xxxxxxxxxx> wrote:
>>>>>> --- a/xen/common/grant_table.c
>>>>>> +++ b/xen/common/grant_table.c
>>>>>> @@ -178,6 +178,10 @@ struct active_grant_entry {
>>>>>>  #define _active_entry(t, e) \
>>>>>>      ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
>>>>>> +bool_t grant_rwlock_barrier;
>>>>>> +
>>>>>> +DEFINE_PER_CPU(rwlock_t *, grant_rwlock);
>>>>> Shouldn't these be per grant table? And wouldn't doing so eliminate
>>>>> the main limitation of the per-CPU rwlocks?
>>>> The grant rwlock is per grant table.
>>> That's understood, but I don't see why the above items aren't, too.
>> Ah - because there is never any circumstance where two grant tables are
>> locked on the same pcpu.
> double_gt_lock()? Those are write locks, and iiuc the limitation is just
> on read locks, but anway.

Those are grant entry locks, not the grant table lock.  The grant table
write lock is only taken to switch grant table version.

>> Nor is there any such need.
> I'm sorry, but no. If the lock flavor is to be tailored to gnttab use,
> then it shouldn't be added outside of grant_table.c.

This mechanism is not specific to the grant table, nor are the grant
tables the only lock intended to be converted in this way.  Doing this
to the p2m lock also gives decent improvements as well, although is
rather more fiddly to get right.

>>>> The entire point of this series is to reduce the cmpxchg storm which
>>>> happens when many pcpus attempt to grap the same domains grant read lock.
>>>> As identified in the commit message, reducing the cmpxchg pressure on
>>>> the cache coherency fabric increases intra-vm network through from
>>>> 10Gbps to 50Gbps when running iperf between two 16-vcpu guests.
>>>> Or in other words, 80% of cpu time is wasted with waiting on an atomic
>>>> read/modify/write operation against a remote hot cache line.
>>> All of this is pretty nice, but again unrelated to the question I
>>> raised.
>>> The whole interface would likely become quite a bit easier to use
>>> if there was a percpu_rwlock_t comprising all three elements (the
>>> per-CPU item obviously would need to become a per-CPU pointer,
>>> with allocation of per-CPU data needing introduction).
>> Runtime per-CPU data allocation is incompatible with our current scheme
>> (which relies on the linker to do some of the heavy lifting).
> Well, there are ways to deal with that. One would be for components
> to declare how much they might need to use in the worst case (along
> the lines of x86 Linux'es .brk mechanism). Another (a variant thereof,
> much easier to implement right away) would be for grant_table.c to
> simply create a per-CPU array with DOMID_FIRST_RESERVED entries,
> using the one corresponding to the domain in question. And of course
> a scheme not as advanced as current Linux'es might do too - after all
> it did for Linux for many years (before the current one got invented).

Or better yet, not use dynamic allocation.

Coming up with a new per-cpu scheme is all fine and well if someone
wants to do so, but it is a substantial task and not necessary here.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.