[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/S3: Fix cpu pool scheduling after suspend/resume



On Tue, Apr 9, 2013 at 2:04 PM, Ben Guthro <benjamin.guthro@xxxxxxxxxx> wrote:
> How would you suggest this be solved then - because we are seeing that
> all non-boot cpus are not being put in a pool at all - which is
> clearly worse than being put into pool0

I don't think this is what you're actually seeing -- or if you are,
then there's a bug somewhere else that this is just triggering.

For the benefit of others: What they're seeing sometimes is that after
a suspend/resume cycle, certain pcpus don't generate any trace
records.  However, it is clear from the traces that *are* generated,
that it's not the case that nothing is happening on those invisible
pcpus; there is clear evidence of the vcpus running on visible pcpus
communicating with vcpus on the invisible pcpus.

But they are also sometimes seeing performance degradation that would
be consistent with pcpus not coming back up; but it's hard to be sure
because sometimes it's just tracing not working.

In any case, the "system state" patch doesn't work for dom0 affinity,
and the recent patch to generically save/restore vcpu affinity is one
step towards making it not needed anymore.  We should either:
* Revert the patch and save/restore stuff properly
* Special-case dom0 to save and re-pin on suspend/resume, since it's
still running
* Make sure that dom0 is handled some other way (e.g., by not
migrating vcpus which are paused)?

Looking at this patch -- it looks like the sched-state patch makes it
so that the scheduler cpu initialization stuff is not called on resume
-- e.g., tick_resume.  Am I reading that correctly?  That could be
part of the problem...

 -George




>
>
>
> On Tue, Apr 9, 2013 at 8:57 AM, Juergen Gross
> <juergen.gross@xxxxxxxxxxxxxx> wrote:
>> On 09.04.2013 14:46, Ben Guthro wrote:
>>>
>>> This review is another S3 scheduler problem with the system_state variable
>>> introduced with the following changeset:
>>>
>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=269f543ea750ed567d18f2e819e5d5ce58eda5c5
>>>
>>> Specifically, the cpu_callback function that takes the CPU down during
>>> suspend, and back up during resume.
>>> We were seeing situations where, after S3, only CPU0 was in cpupool0.
>>> Guest performance suffered greatly, since all vcpus were only on a single
>>> pcpu. Guests under high CPU load showed the problem much more quickly than
>>> an idle guest.
>>>
>>> Removing this if condition forces the CPUs to go through the expected
>>> online/offline state, and be properly scheduled after S3.
>>>
>>> This also includes a necessary partial change proposed earlier by Tomasz
>>> Wroblewski here:
>>> http://lists.xen.org/archives/html/xen-devel/2013-01/msg02206.html
>>>
>>> It should also resolve the issues discussed in this thread:
>>> http://lists.xen.org/archives/html/xen-devel/2012-11/msg01801.html
>>>
>>> Signed-off-by: Ben Guthro<benjamin.guthro@xxxxxxxxxx>
>>> ---
>>>   xen/common/cpu.c     |    3 +++
>>>   xen/common/cpupool.c |    5 -----
>>>   2 files changed, 3 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/xen/common/cpu.c b/xen/common/cpu.c
>>> index 630881e..e20868c 100644
>>> --- a/xen/common/cpu.c
>>> +++ b/xen/common/cpu.c
>>> @@ -5,6 +5,7 @@
>>>   #include<xen/init.h>
>>>   #include<xen/sched.h>
>>>   #include<xen/stop_machine.h>
>>> +#include<xen/sched-if.h>
>>>
>>>   unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
>>>   #ifndef nr_cpumask_bits
>>> @@ -212,6 +213,8 @@ void enable_nonboot_cpus(void)
>>>               BUG_ON(error == -EBUSY);
>>>               printk("Error taking CPU%d up: %d\n", cpu, error);
>>>           }
>>> +        if (system_state == SYS_STATE_resume)
>>> +            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
>>
>>
>> This might solve YOUR problem, but reintroduces the problem why the original
>> change was done: ALL cpus will be in cpupool0 after resume!
>>
>> So: NAK
>>
>>
>> Juergen
>>
>> --
>> Juergen Gross                 Principal Developer Operating Systems
>> PBG PDG ES&S SWE OS6                   Telephone: +49 (0) 89 3222 2967
>> Fujitsu Technology Solutions              e-mail:
>> juergen.gross@xxxxxxxxxxxxxx
>> Domagkstr. 28                           Internet: ts.fujitsu.com
>> D-80807 Muenchen                 Company details:
>> ts.fujitsu.com/imprint.html
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.