[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen4.2 S3 regression?



>>> On 21.09.12 at 20:42, Keir Fraser <keir@xxxxxxx> wrote:
> On 21/09/2012 19:20, "Ben Guthro" <ben@xxxxxxxxxx> wrote:
> 
>> 
>> 
>> On Fri, Sep 21, 2012 at 2:47 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>> 
>>> That's because CPU1 is stuck in cpu_init() (in the infinite loop after
>>> printing "CPU#1 already initialized!"), as Keir pointed out yesterday.
>>> 
>> 
>> I've done some more tracing on this, and instrumented cpu_init(), 
> cpu_uninit()
>> - and found something I cannot quite explain.
>> I was most interested in the cpu_initialized mask, set just above these two
>> functions (and only used in those two functions)
>> 
>> I convert  cpu_initialized to a string, using cpumask_scnprintf - and print 
> it
>> out when it is read, or written in these two functions.
>> 
>> When CPU1 is being torn down, the cpumask bit gets cleared for CPU1, and I 
> am
>> able to print this to the console to verify.
>> However, when the machine is returning from S3, and going through cpu_init -
>> the bit is set again.
>> 
>> Could this be an issue of caches not being flushed?
>> 
>> I see that the last thing done before acpi_enter_sleep_state actually
>> writes PM1A_CONTROL / PM1B_CONTROL to enter S3 is a ACPI_FLUSH_CPU_CACHE()
>> 
>> This analysis seems unlikely, at this point...but I'm not sure what to make 
> of
>> the data other than a cache issue.
>> 
>> Am I "barking up the wrong tree" here?
> 
> Perhaps not. Try dumping it immediately before and after the actual S3
> sleep. Since you probably can't print to serial line at that point, you
> could just take a copy of the bitmap and print them both shortly after S3
> resume. Then if it still looks bad, or the problem magically resolves with
> the extra printing, you can suspect cache flush a bit more strongly.
> However, WBINVD (which is what ACPI_FLUSH_CPU_CACHE() is) should be enough.

CPU0 issuing WBINVD might not be enough; other CPUs should
probably also do so unconditionally (currently they do this only
when using one of the advanced halt forms in acpi_dead_idle()).

While one would think that a halted CPU would not only continue
to keep its cache up-to-date, but also eventually write back its
dirty cache lines, I don't think the latter is actually guaranteed,
so if the CPU ends up getting the INIT before the line was written
back, the modification could get lost.

But of course this theory depends on Ben's system actually using
the default halt mechanism rather than one of the advanced ones.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.