[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wg-test-framework] baroque1 hardware problem



Ian, Don,
I heard from INTEL they will be replacing the Motherboard, hopefully early
next week.  Monday is a Holiday so we may not get the server back in the
rack until Wednesday or Thursday.

It was pretty hard to dispute since the only components in the rack are the
case, power supply motherboard, chip,  fan, DVD and Hard Drive.

Paul

-----Original Message-----
From: Don Koch [mailto:dkoch@xxxxxxxxxxx] 
Sent: Wednesday, May 20, 2015 12:39 PM
To: Ian Jackson
Cc: wg-test-framework@xxxxxxxxxxxxxxxxxxxx; Paul L. George; don@xxxxxxx
Subject: Re: baroque1 hardware problem

On Wed, 20 May 2015 11:54:51 -0400
Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:

> Ian Jackson writes ("Minutes All-Net synchronisation call, 20th May"):
> > ACTION: Ian to send technical details, so that All-net can raise
> >      with supplier (Intel).
> 
> So, the problem is as follows:
> 
> 
> Summary:
> --------
> 
> Sometimes, when powered on, baroque1 does not come up.
> 
> Symptoms are that the serial control lines do change (visible in 
> sympathy log), but no text is printed on the serial console.  Waiting 
> a long time (up to at least ten minutes) has no effect.  Sending 
> "return" on the serial console elicits no response.
> 
> After the problem has occurred, often more than one further attempt to 
> power cycle the machine is required to get it to work again.  After 
> that it works normally until the fault recurs.
> 
> 
> Repro method:
> -------------
> 
>  * Write a pxeboot file which refers to a stock Wheey amd64
>    debian-installer netboot image, and specifies a preseed file.
> 
>  * Power off (via the PDU).
> 
>  * Wait 30s.
> 
>  * Power on (via the PDU).
> 
>  * Monitor the preseed file http server access log waiting for the
>    pressed file to be downloaded.
> 
>  * When the preseed file has been fetched, declare "success".
> 
>    Then run round for the next repetition.  (Generally, this means
>    that the server is powered off in the middle of one of
>    debian-installer's software-fetching steps.)
> 
>    Alternatively, after 350s, declare "failure" and stop.
> 
> 
> Statistical information:
> ------------------------
> 
> * My records show failures after the following number of repetitions:
>    96 (not quite sure about this - data collection was affected by an
>        unrelated network problem on my workstation)
>    29, 25, 26.
> 
> * My records show the following number of attempts needed to get the
>   machine to work at all, again:
>    1, 3, 2, 3
> 
> * I have run the same test on baroque0.  It has managed (at least) 400
>   consecutive power cycle restarts without problem.
> 
> 
> Handover:
> ---------
> 
> I hereby hand both baroque0 and baroque1 over to you.  (It seems most 
> sensible to give you the working machine too, for comparison.)

Noted.

> The current setup in the colo is the PXE configuration as described 
> above.
> 
> So I think it should be possible to reproduce the problem as follows:
> 
>   - power baroque1 off
>   - wait 30s
>   - power baroque1 on
> 
>   - wait for it to show life on the serial console
> 
>   - wait for it to show entry into debian-installer
>      (eg wait for "Setting up the clock" to be printed on the
>       serial console), then declare success and go round again

I should be able to modify the oseleta test script to do this.

> I have disconnected the serial consoles of both machines from 
> sympathy, so you should be able to connect to them with picocom or 
> expect on /dev/ttyRP5 and /dev/ttyRP6.

Thanks.

> NB that I am now going to be away until next Wednesday morning.

NB'ed.

-d

> Ian.
> 


_______________________________________________
Wg-test-framework mailing list
Wg-test-framework@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-test-framework


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.