[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: osstest down, PDU failure



Ian Jackson writes ("osstest down, PDU failure"):
> Currently, osstest is not working.  We have lost one of our PDUs,
> meaning that about half a rack is out of action, including one of the
> VM hosts.
> 
> There has been quite a bit of outstanding maintenance which has been
> deferred due to the pandemic.  I am trying to see if we can get
> someone on-site to the colo, in Massachusetts, soon.  A complication
> is that the replacement PDU is in still New York.  Again, due to the
> pandemic.

I managed to get an on-site look by the staff of the colo facility.  A
breaker had tripped, depriving our PDU of power.  They reset the
breaker.  The VM host has come back fully operational.  I have
verified that all the test boxes connected to that PDU (apart from one
knonw-dead box) are powered and responsive enough.  Initial reports
from a smoke flight were encouraging, so I have re-enabled everything.

It may trip again of course.

A power trip in a colo is not a normal event, but we haven't
determined the root cause.  The colo facility are going to ask their
electrical supply technicians to investigate the trip.  I think the
breaker or associated equipment is probably "smart" and will have some
useful records.

Ian.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.