[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Wg-test-framework] baroque1 hardware problem
Ian, Don, I heard from INTEL they will be replacing the Motherboard, hopefully early next week. Monday is a Holiday so we may not get the server back in the rack until Wednesday or Thursday. It was pretty hard to dispute since the only components in the rack are the case, power supply motherboard, chip, fan, DVD and Hard Drive. Paul -----Original Message----- From: Don Koch [mailto:dkoch@xxxxxxxxxxx] Sent: Wednesday, May 20, 2015 12:39 PM To: Ian Jackson Cc: wg-test-framework@xxxxxxxxxxxxxxxxxxxx; Paul L. George; don@xxxxxxx Subject: Re: baroque1 hardware problem On Wed, 20 May 2015 11:54:51 -0400 Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote: > Ian Jackson writes ("Minutes All-Net synchronisation call, 20th May"): > > ACTION: Ian to send technical details, so that All-net can raise > > with supplier (Intel). > > So, the problem is as follows: > > > Summary: > -------- > > Sometimes, when powered on, baroque1 does not come up. > > Symptoms are that the serial control lines do change (visible in > sympathy log), but no text is printed on the serial console. Waiting > a long time (up to at least ten minutes) has no effect. Sending > "return" on the serial console elicits no response. > > After the problem has occurred, often more than one further attempt to > power cycle the machine is required to get it to work again. After > that it works normally until the fault recurs. > > > Repro method: > ------------- > > * Write a pxeboot file which refers to a stock Wheey amd64 > debian-installer netboot image, and specifies a preseed file. > > * Power off (via the PDU). > > * Wait 30s. > > * Power on (via the PDU). > > * Monitor the preseed file http server access log waiting for the > pressed file to be downloaded. > > * When the preseed file has been fetched, declare "success". > > Then run round for the next repetition. (Generally, this means > that the server is powered off in the middle of one of > debian-installer's software-fetching steps.) > > Alternatively, after 350s, declare "failure" and stop. > > > Statistical information: > ------------------------ > > * My records show failures after the following number of repetitions: > 96 (not quite sure about this - data collection was affected by an > unrelated network problem on my workstation) > 29, 25, 26. > > * My records show the following number of attempts needed to get the > machine to work at all, again: > 1, 3, 2, 3 > > * I have run the same test on baroque0. It has managed (at least) 400 > consecutive power cycle restarts without problem. > > > Handover: > --------- > > I hereby hand both baroque0 and baroque1 over to you. (It seems most > sensible to give you the working machine too, for comparison.) Noted. > The current setup in the colo is the PXE configuration as described > above. > > So I think it should be possible to reproduce the problem as follows: > > - power baroque1 off > - wait 30s > - power baroque1 on > > - wait for it to show life on the serial console > > - wait for it to show entry into debian-installer > (eg wait for "Setting up the clock" to be printed on the > serial console), then declare success and go round again I should be able to modify the oseleta test script to do this. > I have disconnected the serial consoles of both machines from > sympathy, so you should be able to connect to them with picocom or > expect on /dev/ttyRP5 and /dev/ttyRP6. Thanks. > NB that I am now going to be away until next Wednesday morning. NB'ed. -d > Ian. > _______________________________________________ Wg-test-framework mailing list Wg-test-framework@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-test-framework
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |