[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] 答复: [PATCH v2] x86/hpet: Fix possible ASSERT(cpu < nr_cpu_ids)



>>> On 19.04.18 at 06:50, <DavidWang@xxxxxxxxxxx> wrote:
> Thank you for reply. Maybe My description is not clear. Please allow me to 
> explain it again.
> Multiple CPU may be share an channel(according to  hpet_get_channel(), it is 
> possible ). When  one of  them get the lock of 
> channel(hpet_broadcast_exit()->hpet_detach_channel()->spin_lock_irq(&ch->lock))
>  
> , others shouldn't rewrite/clear the ch->cpumask in hpet_broadcast_exit(). 
> This lead to errors.
> For example:
> CPU zero and CPU one share an channel by executing hpet_get_channel() 
> respectively and  ch->cpumask of channel be set to 0x3.
> Next, CPU zero  execute hpet_broadcast_exit()->cpumask_clear_cpu() and the 
> ch->cpumask is 0x2.  
> CPU zero execute 
> hpet_broadcast_exit()->hpet_detach_channel()->cpumask_empty() 
> and it get a false.  
> After that the next moment, CPU one  execute 
> hpet_broadcast_exit()->cpumask_clear_cpu(). That set the ch->cpumask to 0.
> When CPU zero execute hpet_detach_channel()->cpumask_first(), ch->cpu would 
> be 
> set to nr_cpu_ids for ch->cpumask being 0.
> An assertion would happen through hpet_detach_channel() 
> ->set_channel_irq_affinity() -> cpumask_of() ->cpumask_check();
>  
> I think the cause leading to assertion is that  cpu rewrite  shared zone 
> when other is reading. I've tried two ways.  The CPU must get the lock of 
> channel before  executing cpumask_clear_cpu() as patch v1.  Another way of 
> resolving it is "a variable hold the value of  ch->cpumask at the beginning 
> of 
> hpet_detach_channel() as patch v2" .

Following your earlier description I had been able to work out what you
describe above. But that doesn't mean the description of the patch now
can remain unclear: You need to describe the issue in an understandable
way. This doesn't, however, necessarily mean to make the description
much longer (i.e. I don't think the above would be a suitable replacement).
See below.

> "how about eliminating the cpumask_empty() call in favor of just the 
> cpumask_first()"  
> Do you mean to delete  the cpumask_empty() and leave the cpumask_first()? 

FAOD "leave" isn't the right term: I did suggest to _move_ cpumask_first(),
such that you can use its result both in place of the current cpumask_empty()
one _and_ where it is being used currently.

>>>> On 18.04.18 at 11:25, <Davidwang@xxxxxxxxxxx> wrote:
>> From: David Wang <davidwang@xxxxxxxxxxx>
>> 
>> For the ch->cpumask be cleared by other cpu, cpumask_first() called by
>> hpet_detach_channel() return nr_cpu_ids. That lead an assertion in
>> set_channel_irq_affinity() when cpumask_of() check cpu.
>> Fix this by using a local variable.

I think the description in v1 came closer to something understandable.
Taking that, how about:

"CPUs may share an in-use channel. Hence clearing of a bit from the
 cpumask (in hpet_broadcast_exit()) as well as setting one (in
 hpet_broadcast_enter()) must not race evaluation of that same
 cpumask. Therefore avoid evaluating the cpumask twice in
 hpet_detach_channel(). Otherwise cpumask_empty() may e.g. return
 false while the subsequent cpumask_first() could return nr_cpu_ids,
 which then triggers the assertion in cpumask_of() reached through
 set_channel_irq_affinity()."

?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.