[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Wg-test-framework] servers w/Hardware problem



Guys ,

Quite frankly I don't have the slightest idea how approach this problem.  At
first glance it would appear to be BIOS related but that becomes suspect
when Merlot1 appears to be working fine.  The BIOS revs on both of these
machines are identical. Just to humor me, please check the BIOS rev of
Merlot1 and let me know what it is.

Paul
-----Original Message-----
From: Ian Jackson [mailto:Ian.Jackson@xxxxxxxxxxxxx] 
Sent: Friday, December 18, 2015 10:21 AM
To: Paul L. George
Cc: Lars Kurth; wg-test-framework@xxxxxxxxxxxxxxxxxxxx
Subject: Re: servers w/Hardware problem

(Dropping Yogesh, adding Lars and
wg-test-framework@xxxxxxxxxxxxxxxxxxxx)

Paul L. George writes ("Re: servers w/Hardware problem"):
> We have a problem because these machines are warranted against 
> manufacturer's defects.  This sound like a BIOS issue.  The Dell's are 
> still under warrantee but the HP isn't.

FAOD "the Dells" must refer to elbling0 and elbling1 which are Dell
PowerEdge R320s and "the HP" refers to merlot0 which is an HP DL385p.

AFAICT from the "PURCHASE AND SALE AGREEMENT.doc", the warranty (see Exhibit
E in that document) runs for 1 year from the date of acceptance, which
AFAICT happened no earlier than the 24th of June 2015.


> How are the systems rebooted? Reboot command or power cycled?  I would 
> like to try and reproduce the problem.

The machines are configured to netboot.  The usual installation cycle
is:

 * Power off
 * Adjust PXE configuration to prepare for autoinstall
 * Power on
 * Debian (wheezy, in most relevant cases here) autoinstallation
   takes place, which involves one reboot; during the installation
   the PXE configuration is changed to chainload the hard disk, and
   there is one software-initiated reboot.
 * A second software-initiated reboot into versions of Xen and Linux
   which are to be tested.

This cycle is repeated many times per day.  A typical test run would last
between 30 and 200 minutes.  So this power cycling might occur as much as a
few dozen times a day.

The symptom is that the boot settings in the BIOS loses the network, or that
the network option ends up later than the hard disk.  After this, the
PXE-based autoinstallation fails.

My first-instance remedy was to reset the boot order settings via the BIOS
serial console.  But we decided each machines was faulty after the same
problem occured more than once, with the same machine, within a fairly short
period (a matter of a few weeks).  If you need to know detailed timings I
can look up more detailed records.

> Am I to understand that the other Merlot system works fine.

We have not seen this problem with merlot1 and we don't currently suspect
merlot1 has any other hardware problems.

Thanks,
Ian.


_______________________________________________
Wg-test-framework mailing list
Wg-test-framework@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-test-framework


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.