[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wg-test-framework] baroque1 hardware problem



On Wed, 20 May 2015 11:54:51 -0400
Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:

> Ian Jackson writes ("Minutes All-Net synchronisation call, 20th May"):
> > ACTION: Ian to send technical details, so that All-net can raise
> >      with supplier (Intel).
> 
> So, the problem is as follows:
> 
> 
> Summary:
> --------
> 
> Sometimes, when powered on, baroque1 does not come up.
> 
> Symptoms are that the serial control lines do change (visible in
> sympathy log), but no text is printed on the serial console.  Waiting
> a long time (up to at least ten minutes) has no effect.  Sending
> "return" on the serial console elicits no response.
> 
> After the problem has occurred, often more than one further attempt to
> power cycle the machine is required to get it to work again.  After
> that it works normally until the fault recurs.
> 
> 
> Repro method:
> -------------
> 
>  * Write a pxeboot file which refers to a stock Wheey amd64
>    debian-installer netboot image, and specifies a preseed file.
> 
>  * Power off (via the PDU).
> 
>  * Wait 30s.
> 
>  * Power on (via the PDU).
> 
>  * Monitor the preseed file http server access log waiting for the
>    pressed file to be downloaded.
> 
>  * When the preseed file has been fetched, declare "success".
> 
>    Then run round for the next repetition.  (Generally, this means
>    that the server is powered off in the middle of one of
>    debian-installer's software-fetching steps.)
> 
>    Alternatively, after 350s, declare "failure" and stop.
> 
> 
> Statistical information:
> ------------------------
> 
> * My records show failures after the following number of repetitions:
>    96 (not quite sure about this - data collection was affected by an
>        unrelated network problem on my workstation)
>    29, 25, 26.
> 
> * My records show the following number of attempts needed to get the
>   machine to work at all, again:
>    1, 3, 2, 3
> 
> * I have run the same test on baroque0.  It has managed (at least) 400
>   consecutive power cycle restarts without problem.
> 
> 
> Handover:
> ---------
> 
> I hereby hand both baroque0 and baroque1 over to you.  (It seems most
> sensible to give you the working machine too, for comparison.)

Noted.

> The current setup in the colo is the PXE configuration as described
> above.
> 
> So I think it should be possible to reproduce the problem as follows:
> 
>   - power baroque1 off
>   - wait 30s
>   - power baroque1 on
> 
>   - wait for it to show life on the serial console
> 
>   - wait for it to show entry into debian-installer
>      (eg wait for "Setting up the clock" to be printed on the
>       serial console), then declare success and go round again

I should be able to modify the oseleta test script to do this.

> I have disconnected the serial consoles of both machines from
> sympathy, so you should be able to connect to them with picocom or
> expect on /dev/ttyRP5 and /dev/ttyRP6.

Thanks.

> NB that I am now going to be away until next Wednesday morning.

NB'ed.

-d

> Ian.
> 

_______________________________________________
Wg-test-framework mailing list
Wg-test-framework@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-test-framework


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.