[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Introduce an s3 test



On May 2, 2013, at 11:06 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
 wrote:

> Ben Guthro writes ("Re: [PATCH] Introduce an s3 test"):
>> On 05/01/2013 06:56 AM, Ian Jackson wrote:
>>>> +# check log for resume message
>>>> +poll_loop(4*$timeout, 2, 's3-confirm-resumed',
>>>> +  target_cmd_output($ho,"xl dmesg | grep 'ACPI S' | tail -1 | " .
>>>> +          "grep -n 'Finishing wakeup from S3 state'"));
>>> 
>>> Why does this need a poll loop ?  Surely after the machine comes out
>>> of suspend it should be up right away ?
>> 
>> This is a bit of a "first pass" in a test environment I've never used 
>> before. I modeled this after other tests I found in the same dir. If 
>> this is inappropriate, then I suspect you are correct.
> 
> Maybe you should be using guest_check_up ?

I'll own up to the fact that I wasn't really able to test the infrastructure 
portions of this script.
I was unsuccessful in getting them to run, even using the "standalone" branch.

It would really help if someone who has access to the test infrastructure could 
take my script as a starting point, and adapt it to whatever is necessary for 
that test environment.

> 
>> I put it in the loop for the case of networking taking some time to come 
>> back online, so if the ssh command failed it would be retried. 
> 
> How long is it supposed to take to come back online ?  "4*$timeout"
> seems (a) a bit arbitrary (b) rather long with your existing value of
> $timeout.

For all devices to come back online, it can sometimes take up to 20s.

This value was arbitrary, but chosen with the RTC variance + devices coming on 
line.
This should probably be a tunable value.

> 
>> Additionally, I have found that the RTC wakeup mechanism is not very 
>> accurate in its timing.
> 
> How unfortunate.

Indeed. We frequently see sleeping machines for 1m sometimes results in 
sometimes results in machines waking up 30s later - others 3m later.

> 
>>> I'm not sure I follow this.  Wouldn't messed up timer queues cause
>>> other trouble in the guest ?
>> 
>> Yes, but it has been a common point of failure / problems after S3. I 
>> put this here as a placeholder to verify that everything is still as it 
>> should be.
> 
> Err, OK.
> 

I see automated testing as a resource to be able to confirm that problems that 
occurred in the past do not re-emerge from new development, rather than 
strictly functional testing.
If you disagree with this, feel free to remove it. I don't feel strongly about 
this particular point.


>>>> +# - Check for kernel Oops
>>>> +# - Check for Xen WARN
>>> 
>>> These are a good idea but should perhaps be a separate test step.
>> 
>> Wouldn't you want a warning/oops that was provoked by S3 to be 
>> associated with that test?
> 
> Hrm.  Well in principle this is surely true of any test.
> 
> Can we make warnings/oopses fatal ?
> 

That seems like it would be prudent, if possible.
As I mentioned above, I had difficulty configuring this test environment, so it 
may be trivial, and I am just not familiar enough with this environment.


Ben


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.