[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] pvops: Does PVOPS guest os support online "suspend/resume"



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx]
> Sent: Wednesday, August 14, 2013 12:35 AM
> To: Gonglei (Arei)
> Cc: rshriram@xxxxxxxxx; xen-devel@xxxxxxxxxxxxx; Zhangbo (Oscar);
> Luonengjun; ian.campbell@xxxxxxxxxx; stefano.stabellini@xxxxxxxxxxxxx;
> rjw@xxxxxxx; Yanqiangjun; Jinjian (Ken)
> Subject: Re: [Xen-devel] pvops: Does PVOPS guest os support online
> "suspend/resume"
> 
> On Tue, Aug 13, 2013 at 02:38:18PM +0000, Gonglei (Arei) wrote:
> > Hi,
> > I rechecked the different kernels today, and found that I made a mistake
> before. sorry for misleading you all:)
> >
> > All in all, the problems should be concluded in the 2 items below:
> > 1 the kernel 2.6.32 PVOPS guest os(I tested RHEL6.1 and RHEL6.3), does have
> bugs in ONLINE suspend/resume (checkpoint), which was,
> > as Shriram mentioned, fixed in:
> >
> http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/driver
> s/xen/manage.c?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
> > 2 the kernel above 3.0(I tested Ubuntu12.10 with kernel 3.5 and Ubuntu13.04
> with kernel 3.8), they seem to have another "bug":
> >   1) if we set MULTI VCPUS for the guest os, it would have problems in
> resuming(to be correctly, it's thaw).
> >      In details:
> >          <1>set the guest os with 4 vcpus
> >              in dom1.cfg: vcpus=4
> >          <2>xl create dom1.cfg
> >              excute command "top -d 1" in guest dom1's vnc window
> >          <3>xl save -c dom1 /opt/dom1.save
> >          <4>after step <3>, we check the guest dom1's vnc window, and
> found that:
> >              kernel thread migration/1, migration/2, migration/3 got
> their cpu usage up to 100%
> >                    the guest os couldn't respond to any request such as
> mouse movement or keyboard input.
> >                    no "thaw" things printed in dom1's serial output.
> >
> >   2) if we set only 1 vcpu for the guest os, it would thaw back and works 
> > fine.
> >   3) anyother odd thing is that: if we use the saved file generated in 2-1) 
> > to
> restore the guest, and then do online suspend/resume (xl save -c, checkpoint),
> > it would be fine, no problems occurred.
> >
> > Such problem occurs on guest os with kernel 3.5/3.8(maybe other kernels as
> well, not tested). I hope that the steps I did was correct.
> 
> Please do check with the upstream kernel. There were some CPU hotplug
> issues in older kernels
> and just to make sure that this is not one of them it would be good to 
> eliminate
> this.
> 
> Please do test with v3.11-rc5.
> 
> > Have you ever entercounter such "suspend/resume checkpoint on multi-vcpu
> guest os" problem?
> >
> > -------
> > PS: BTW, I'm wondering why using freeze/thaw instead of suspend/resume
> would solve the problem with kernels below 3.0?
> >  It seems that blkfront_resume is still called if we use thaw method here,
> because blkfront has no available pm_op.
> >
> >     static int device_resume(struct device *dev, pm_message_t state, bool
> async)
> >     {
> >          ââââ
> >                    if (dev->bus) {
> >                    if (dev->bus->pm) {
> >                             info = "bus ";
> >                             callback = pm_op(dev->bus->pm, state);
> >                    } else if (dev->bus->resume) {
> >                             info = "legacy bus ";
> >                             callback = dev->bus->resume;
> //blkfront_resume is called here. here?
> >                             goto End;
> 
> One easy way to figure this out is to stick printks in here to see if that 
> blkfront
> code
> is indeed called. You can also use 'dump_stack()' to get a nice stack-trace.

Hi,
1 I tried kernel 3.11-rc6, it has the same problem: 
    after doing the checkpoint, multi-vcpu guest os can't respond to anything, 
because its kernel threads migration/1, migration/2, etc, got their cpu usage 
up to 100%
        
2 kernel 3.0 doesn't have this problem.

So, It seems that some bugs came out between v3.0 and v3.5, something 
concerning vcpu freeze/thaw ? Thanks!

-Gonglei
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.