|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] grant_table: convert grant table rwlock to percpu rwlock
>>> On 18.11.15 at 11:06, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 18/11/15 07:45, Jan Beulich wrote:
>>>>> On 17.11.15 at 18:53, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 17/11/15 17:39, Jan Beulich wrote:
>>>>>>> On 17.11.15 at 18:30, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>> On 17/11/15 17:04, Jan Beulich wrote:
>>>>>>>>> On 03.11.15 at 18:58, <malcolm.crossley@xxxxxxxxxx> wrote:
>>>>>>> --- a/xen/common/grant_table.c
>>>>>>> +++ b/xen/common/grant_table.c
>>>>>>> @@ -178,6 +178,10 @@ struct active_grant_entry {
>>>>>>> #define _active_entry(t, e) \
>>>>>>> ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
>>>>>>>
>>>>>>> +bool_t grant_rwlock_barrier;
>>>>>>> +
>>>>>>> +DEFINE_PER_CPU(rwlock_t *, grant_rwlock);
>>>>>> Shouldn't these be per grant table? And wouldn't doing so eliminate
>>>>>> the main limitation of the per-CPU rwlocks?
>>>>> The grant rwlock is per grant table.
>>>> That's understood, but I don't see why the above items aren't, too.
>>> Ah - because there is never any circumstance where two grant tables are
>>> locked on the same pcpu.
>> double_gt_lock()? Those are write locks, and iiuc the limitation is just
>> on read locks, but anway.
>
> Those are grant entry locks, not the grant table lock. The grant table
> write lock is only taken to switch grant table version.
static inline void
double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
{
/*
* See mapkind() for why the write lock is also required for the
* remote domain.
*/
if ( lgt < rgt )
{
write_lock(&lgt->lock);
write_lock(&rgt->lock);
}
else
{
if ( lgt != rgt )
write_lock(&rgt->lock);
write_lock(&lgt->lock);
}
}
looks very much like taking two grant table locks to me.
>>> Nor is there any such need.
>> I'm sorry, but no. If the lock flavor is to be tailored to gnttab use,
>> then it shouldn't be added outside of grant_table.c.
>
> This mechanism is not specific to the grant table, nor are the grant
> tables the only lock intended to be converted in this way. Doing this
> to the p2m lock also gives decent improvements as well, although is
> rather more fiddly to get right.
And there are no nested uses of p2m locks? I recall a PVH issue
needing special tweaking because nested locking could occur for
both a PVH Dom0's lock and a DomU's controlled by it.
As long as there's not going to be percpu_rwlock_t, with all pieces
grouped together, this construct seems too special purpose to me.
>>>>> The entire point of this series is to reduce the cmpxchg storm which
>>>>> happens when many pcpus attempt to grap the same domains grant read lock.
>>>>>
>>>>> As identified in the commit message, reducing the cmpxchg pressure on
>>>>> the cache coherency fabric increases intra-vm network through from
>>>>> 10Gbps to 50Gbps when running iperf between two 16-vcpu guests.
>>>>>
>>>>> Or in other words, 80% of cpu time is wasted with waiting on an atomic
>>>>> read/modify/write operation against a remote hot cache line.
>>>> All of this is pretty nice, but again unrelated to the question I
>>>> raised.
>>>>
>>>> The whole interface would likely become quite a bit easier to use
>>>> if there was a percpu_rwlock_t comprising all three elements (the
>>>> per-CPU item obviously would need to become a per-CPU pointer,
>>>> with allocation of per-CPU data needing introduction).
>>> Runtime per-CPU data allocation is incompatible with our current scheme
>>> (which relies on the linker to do some of the heavy lifting).
>> Well, there are ways to deal with that. One would be for components
>> to declare how much they might need to use in the worst case (along
>> the lines of x86 Linux'es .brk mechanism). Another (a variant thereof,
>> much easier to implement right away) would be for grant_table.c to
>> simply create a per-CPU array with DOMID_FIRST_RESERVED entries,
>> using the one corresponding to the domain in question. And of course
>> a scheme not as advanced as current Linux'es might do too - after all
>> it did for Linux for many years (before the current one got invented).
>
> Or better yet, not use dynamic allocation.
Which is what the middle of the three suggestions represents...
> Coming up with a new per-cpu scheme is all fine and well if someone
> wants to do so, but it is a substantial task and not necessary here.
... to keep things simple for now.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |