[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: osstest down, PDU failure

Ian Jackson writes ("osstest down, PDU failure"):
> Currently, osstest is not working.  We have lost one of our PDUs,
> meaning that about half a rack is out of action, including one of the
> VM hosts.
> There has been quite a bit of outstanding maintenance which has been
> deferred due to the pandemic.  I am trying to see if we can get
> someone on-site to the colo, in Massachusetts, soon.  A complication
> is that the replacement PDU is in still New York.  Again, due to the
> pandemic.

I managed to get an on-site look by the staff of the colo facility.  A
breaker had tripped, depriving our PDU of power.  They reset the
breaker.  The VM host has come back fully operational.  I have
verified that all the test boxes connected to that PDU (apart from one
knonw-dead box) are powered and responsive enough.  Initial reports
from a smoke flight were encouraging, so I have re-enabled everything.

It may trip again of course.

A power trip in a colo is not a normal event, but we haven't
determined the root cause.  The colo facility are going to ask their
electrical supply technicians to investigate the trip.  I think the
breaker or associated equipment is probably "smart" and will have some
useful records.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.