[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] XCP: Crashes on dual Xeon HP ProLiant systems
On Friday 30 April 2010 11:20:07 am Pasi Kärkkäinen wrote: > On Fri, Apr 30, 2010 at 09:32:37AM -0700, dwight at supercomputer.org wrote: > > Is anyone else running the latest XCP on HP ProLiant DL380 > > systems? Or a similar dual Xeon 8-core system? I'm seeing > > spontaneous reboots when under a load. ... > > Uhm.. the compiler really shouldn't crash. > > Are you sure your hardware is OK? If the stock EL5.4 Xen also > crashes, it could be broken hardware? > > Did you try running memtest86+ ? > > Is baremetal Linux stable, if you run for example > "make -j8 bzImage && make -j8 modules && make clean" kernel build > in a loop? > > -- Pasi Thank you for your reply, Pasi. I agree that the compiler shouldn't crash. That's definitely rude behavior. It might well be broken hardware. I was thinking that it was more likely that it was an issue between the older CentOS Xen and this much newer Xeon hardware. And so the "hardware or OS problem" that gcc was complaining about was an issue with the Virtualized hardware. But yesterday I ran into a different issue, which leads me to believe that it is either a physical hardware or Dom0 OS issue. On the machine which was running XCP, I tried installing 64-bit CentOS 5.4. The installation crashed. Two separate times. The first time I didn't have a log file (since it was a video based installation). The second time through though I used the iLO virtualized serial port, and I could see that the installation crashed about halfway through. Again, a spontaneous reboot, as XCP experienced. I talked to one of the guys in the lab, who has done far more installations of these ProLiant (and Dell) boxes than I have, and he was quite familiar with this. He said that on some of these boxes (both HP and Dell), the 64-bit CentOS 5.4 install will crash. But supposedly the 32-bit installation will work. He also said that CentOS 5.3, both 32 and 64 bit, work fine. I realize that this is anecdotal, and I don't have any more information here (as to the CPU's and hardware), but I thought that this was interesting. At this point, I don't trust either the hardware or the OS, so I'm going to start a full diagnostics run using a suite that I've put together over the past 15 years, which has served me very well in qualifying boxes. memtest86 is one of these. I mentioned earlier that I had started an overnight run of this on both boxes. I can now report that both have passed. After 12+ hours, they had gone successfully through two separate runs without error. Next up is prime95, with the torture test. Nothing else comes close to exercising the CPU, as indicated by the heat given off during this test. This will also be a test of the thermal cooling. If that passes, then I'm going to exercise the disk subsystem. One of these is very similar to what you suggested. Specifically, multiple rebuilds of the kernel, but from scratch each time. Frankly, though, I'm going to see if I can get a different ProLiant box. Nonetheless, I want the data on this one. I'm hoping that I can detect a box which will fail, before I run XCP on it. I'll post the results when I have them, hopefully in a couple of days. -dwight- _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |