[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] dom_ids array implementation.



>>> On 27.04.17 at 04:38, <yi.y.sun@xxxxxxxxxxxxxxx> wrote:
> On 17-04-26 04:04:15, Jan Beulich wrote:
>> >>> On 20.04.17 at 07:38, <yi.y.sun@xxxxxxxxxxxxxxx> wrote:
>> > @@ -221,12 +210,17 @@ static void free_socket_resources(unsigned int 
>> > socket)
>> >       */
>> >      for ( i = 0; i < PSR_SOCKET_MAX_FEAT; i++ )
>> >      {
>> > -        if ( !info->features[i] )
>> > -            continue;
>> > -
>> >          xfree(info->features[i]);
>> >          info->features[i] = NULL;
>> >      }
>> > +
>> > +    spin_lock(&info->ref_lock);
>> > +    memset(info->cos_ref, 0, MAX_COS_REG_CNT * sizeof(unsigned int));
>> > +    spin_unlock(&info->ref_lock);
>> > +
>> > +    spin_lock_irqsave(&info->dom_ids_lock, flag);
>> > +    memset(info->dom_ids, 0, PSR_DOM_IDS_NUM * sizeof(uint32_t));
>> 
>> bitmap_clear()
>> 
>> I'm also not convinced you need to acquire either of the two locks
>> here - you're cleaning up the socket after all, so nothing can be
>> running on it anymore.
>> 
> Can domain destroy happens at the same time when a socket is offline?

Well, yes and no - it depends on what path exactly you sit here.
Large parts of CPU onlining/offlining happen in stop-machine
context, which would exclude domain destruction going on in
parallel.

>> > +        /*
>> > +         * If old_bit is 0, that means this is the first time the domain 
>> > is
>> > +         * switched to this socket or domain's COS ID has not been set 
>> > since
>> > +         * the socket is online. So, the domain's COS ID on this socket 
>> > should
>> > +         * be default value, 0. If not, that means this socket has been 
>> > offline
>> > +         * and the domain's COS ID has been set when the socket was 
>> > online. So,
>> > +         * this COS ID is invalid and we have to restore it to 0.
>> > +         */
>> > +        if ( d->arch.psr_cos_ids &&
>> > +             old_bit == 0 &&
>> > +             d->arch.psr_cos_ids[socket] != 0 )
>> 
>> Why don't you replicate the other two conditions in the if() trying to
>> avoid taking the lock? (Especially if above you indeed intend to use
>> a helper function, abstracting the full condition into another one
>> would be very desirable.)
>> 
> Ok, will move the two conditions to above 'if()', like below.
> 
> if ( likely(test_bit(d->domain_id, info->dom_ids)) ||
>      !d->arch.psr_cos_ids ||
>      !d->arch.psr_cos_ids[socket] )
> 
> Accordingly, the later codes should be:
> 
> spin_lock(&info->dom_ids_lock);
> set_bit(d->domain_id, info->dom_ids);
> d->arch.psr_cos_ids[socket] = 0;
> spin_unlock(&info->dom_ids_lock);

Then you didn't fully understand: The test_and_ portion _cannot_
be moved out of the locked region, but a simple test_bit() can be
replicated prior to taking the lock.

>> > +            d->arch.psr_cos_ids[socket] = 0;
>> > +
>> > +        spin_unlock(&info->dom_ids_lock);
>> 
>> And then, as a whole: As indicated before, ideally you'd keep this
>> out of the context switch path altogether. What are the alternatives?
>> 
> To restore domains' "psr_cos_ids[socket]" to default when socket offline
> happens, we have three time windows:
> 1. When socket is offline, in "free_socket_resources()";
> 2. When socket is online, in "psr_cpu_init()";
> 3. When context switch happens, in "psr_ctxt_switch_to()".
> 
> Option 1 and 2 have same effect and option 1 is more natural than 2. So, we 
> can
> do this restore action at "1" or "3".
> 
> I have two alternatives below. Please help to check which you think is better:
> 1. The first version of the patch iterates valid domain list to restore them 
> one
> by one. Per your comments, it may take much time. That is the reason I 
> submitted
> this patch to spread out the restore action of all domains. If you think
> "psr_cos_ids[socket]" restore action happens in context switch path is not 
> good,
> can we use a tasklet in "free_socket_resources()" to iterate the domain list 
> and
> restore their "psr_cos_ids"?

If that tasklet (a) doesn't again take overly long and (b) is
guaranteed to finish before the same socket may come back online
again, then yes. Otherwise both the iterate-over-all-domains and
the in-context-switch approaches have downsides, but the latter
would then seem preferable (because it only affects performance
without risking the system's health). The question is whether some
3rd method can't be found.

> 2. Or, can we use a tasklet in "psr_ctxt_switch_to()" to do above work? The 
> side
> effect is that the domain's COS ID used in this switch is not right. The valid
> COS ID may be set in next context switch.

I think this would complicate things while at the same time making
them worse.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.