[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
On 19/05/2010 15:30, "Roger Cruz" <roger.cruz@xxxxxxxxxxxxxxxxxxx> wrote: > 2) The way I narrow down the problem to these lines of code was by inserting a > "while(1);" loop at different points in the code. When it didn't reboot, I > knew it had gotten to my while loop. I just kept moving the while loop until > I found the lines I highlighted in my previous msg. Below is what my debug > code looks like: Your system seems to hobble along just fine if you remove the BUG_ON()s, so why not convert them into printk() warnings? Or if it's too early for printk, stash some info in memory and printk() it at the very end of S3 resume. > 3) You can see above that the vmx_vmexit_control check was the point at which > the crash/reboot was being triggered. However, if I commented out just that > line, I would still see a reboot. Only when I commented the whole block out > did it finally work. Is something overwriting the location of these > variables such that when I commented out a line of code, it moved the data > segment causing a different variable to be overwritten? I need to be able > to explain this behavior. So I will working towards that today. I would assume that more than one of the BUG_ON()s is triggering. So if you just comment out the first offending one that you find, you instead fall foul of a second one. > 4) My initial thoughts were that the BIOS was overwriting some of these > locations, so I performed an experiment that I believe rules out the BIOS. I > commented out the code in power.c that puts the CPU into the sleep mode. This > had the effect of going through most of the sleep and wakeup code in power.c > (it does not go through all the wakeup.S initialization as well). When I did > this, it still failed to resume from sleep as long as an HVM domain was > present. Here is the diff on power.c Yep, that patch should do the expected thing and do everything except the actual BIOS S3 transition. Well, overall this does sound like a memory corruption issue, not a BIOS or platform issue. You need to printk out the contents of variables contributing to your failing BUG_ON()s and see what's written there, I think. -- Keir > 5) The problem occurs even when Xen is run in uni-processor mode. I achieved > this by adding "nosmp=1 maxcpus=1" to the grub command line that boots xen. I > confirmed that Xen only reported one physical CPU, namely CPU0. This should > have avoided any issues with waking up other non-boot processors. > > 6) Finally, I narrowed down the type of domain and condition of the domain > that would exhibit the problem, by using python to create a domain with me > being able to control its definition. If I set "flags" to 0, the problem is > does not show up. If I set it to "1" (hvm) and do NOT execute the > "xc.domain_max_vcpus" call, the problem does not show up. However, once I add > one VCPU to this domain, the problem occurs. > > #! /usr/bin/python > import sys > sys.path.append('/usr/lib/python2.6/site-packages') > import xen.lowlevel.xc > from xen.xend import uuid > xc = xen.lowlevel.xc.xc() > domid=xc.domain_create(domid=0,ssidref=0,handle=uuid.fromString("bad0beef-dead > -beef-dead-beefdeadbeef"), flags=1) > > print domid > xc.domain_max_vcpus(domid, 1) > > > Roger R. Cruz > > > > -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] > Sent: Wed 5/19/2010 3:25 AM > To: Roger Cruz; xen-devel@xxxxxxxxxxxxxxxxxxx > Subject: Re: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764 > > On 18/05/2010 23:34, "Roger Cruz" <roger.cruz@xxxxxxxxxxxxxxxxxxx> wrote: > >> A little more info. I am now able to wake up the Dell Inspiron 1764 after I >> put it to sleep. I found that the code commented out below would cause the >> problems in my system. I have yet to understand why these variables don't >> end >> up with the expected values. If anyone has any thoughts that they would like >> to share on how this code works and why it is comparing to stored variables, >> I >> would very much like to hear them. > > The BUG_ONs are to detect VMX versioning inconsistencies between processors. > The weird thing here is that you presumably brought all CPUs online during > initial system boto with no problem. So somehow something has changed only > after resume from S3. I think you will need to add tracing to discover which > BUG_ON is failing, and why. > > Incidentally, in my CPU hotplug cleanup I will be making it so that CPUs > that fail the checks will fail to come online, rather than crash the system. > Which is a bit of an improvement, but obviously something is buggy > underlying this (possibly in BIOS code). > > -- Keir > >> Thank you >> Roger R. Cruz >> >> >> diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c >> --- a/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c >> +++ b/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c >> >> @@ -191,19 +192,25 @@ >> cpu_has_vmx_ins_outs_instr_info = !!(vmx_basic_msr_high & (1U<<22)); >> vmx_display_features(); >> } >> +#if 0 >> else >> { >> /* Globals are already initialised: re-check them. */ >> BUG_ON(vmcs_revision_id != vmx_basic_msr_low); >> BUG_ON(vmx_pin_based_exec_control != _vmx_pin_based_exec_control); >> BUG_ON(vmx_cpu_based_exec_control != _vmx_cpu_based_exec_control); >> BUG_ON(vmx_secondary_exec_control != _vmx_secondary_exec_control); >> BUG_ON(vmx_vmexit_control != _vmx_vmexit_control); >> BUG_ON(vmx_vmentry_control != _vmx_vmentry_control); >> BUG_ON(cpu_has_vmx_ins_outs_instr_info != >> !!(vmx_basic_msr_high & (1U<<22))); >> } >> >> +#endif >> /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */ >> BUG_ON((vmx_basic_msr_high & 0x1fff) > PAGE_SIZE); >> >> >> -----Original Message----- >> From: Roger Cruz >> Sent: Wed 5/12/2010 2:38 PM >> To: Roger Cruz; xen-devel@xxxxxxxxxxxxxxxxxxx >> Subject: RE: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764 >> >> >> We have made some progress in getting the inspiron laptops to work under Xen. >> We tried xenunstable and xen-4.0.0 and discovered that xenunstable can resume >> whereas xen-4.0.0 cannot. Through trial and error, we have been able to >> narrow down the actual changes that allowed it to work. It looks like moving >> the trampoline code down from its 0x8c000 location allowed it to resume. >> >> So we took the change below and applied it to our 3.4.2 tree. However, we >> still have a problem in our 3.4.2 tree with this patch applied. If an HVM >> guest is running, the resume will fail with the exact same behavior as >> before. >> Due to our environment setup, we have not been able to test xenunstable with >> an HVM guest, so we can't say if this problem is fixed in xenunstable or not. >> Can someone familiar with these changes provide a clue as to what is going >> on? >> how does having an HVM guest running affect the resume functionality? >> Running >> PV linux guests does not affect resume, only HVM guests do. >> >> >> --- old/xen-3.4.2/xen/include/asm-x86/config.h 2010-05-12 11:44:35.243564976 >> -0400 >> +++ new/xen-3.4.2/xen/include/asm-x86/config.h 2010-05-12 11:44:35.026578602 >> -0400 >> @@ -96,7 +96,7 @@ >> /* Primary stack is restricted to 8kB by guard pages. */ >> #define PRIMARY_STACK_SIZE 8192 >> >> -#define BOOT_TRAMPOLINE 0x8c000 >> +#define BOOT_TRAMPOLINE 0x7c000 >> #define bootsym_phys(sym) \ >> (((unsigned long)&(sym)-(unsigned >> long)&trampoline_start)+BOOT_TRAMPOLINE) >> #define bootsym(sym) \ >> >> >> >> --- old/xen-3.4.2/xen/include/asm-x86/config.h 2010-05-12 11:44:35.243564976 >> -0400 >> +++ new/xen-3.4.2/xen/include/asm-x86/config.h 2010-05-12 11:44:35.026578602 >> -0400 >> @@ -96,7 +96,7 @@ >> /* Primary stack is restricted to 8kB by guard pages. */ >> #define PRIMARY_STACK_SIZE 8192 >> >> -#define BOOT_TRAMPOLINE 0x8c000 >> +#define BOOT_TRAMPOLINE 0x7c000 >> #define bootsym_phys(sym) \ >> (((unsigned long)&(sym)-(unsigned >> long)&trampoline_start)+BOOT_TRAMPOLINE) >> #define bootsym(sym) \ >> >> ------- >> >> Hello fellow Xen developers, >> >> I'm about to start debugging why Dell Inspirons running Xen 3.4.2 fail to >> resume after a suspend operation. A colleague has also found that the >> problem >> exists on bare-metal Linux >> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/571422) and an upstream >> patch has been created >> (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commitdiff;h=29c60c>> c >> c1a408371885d79d8f8c081fbcb9b10be). >> >> I would like to find out if anyone in the Xen community has encountered this >> problem and if a fix is in the works. Otherwise, I will attempt to provide a >> similar solution to Linux's patch. >> >> thanks >> Roger >> >> >> > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |