[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] segfault in VM



Clearly there's some fairly random memory corruption going on, which
then causes segfaults (if the corruption hits code pages) and
filesystem corruption (if the corruption hits buffer-cache pages).

The "Bailing: not a -ve offset" and "GPF (0004):" messages are almost
certainly just symptoms of executing a corrupted block of code. i.e.,
the bug has already triggered some time ago - probably corrupted a
page of glibc or the kernel.

It would be interesting to see whether or not this is SMP-related.
It's also interesting that someone said they couldn't reproduce
corruption when using 2.6.7 for the non-privileged guest OSes.

 -- Keir

> that sounds like the same sort of errors i'm getting which appeared to be 
> filesystem corruption. First the corruption starts, then everything you do 
> causes a segfault, although i've only seen funny things happen in dom0.
> 
> In the limited testing i've done it looks like dom0 by itself is stable, but 
> crashes start occuring once I start up other domains and work dom0 hard 
> (other domains running under light load). I'm running this script in dom0:
> 
> #!/bin/sh
> while [ 1 = 1 ]
> do
>  diff file3 file4 && echo okay
> done
> 
> where file3 and file4 are around 300mb files, and the vm has 128mb of memory 
> with no swap. This ensures that none of the file is cached so there's lots of 
> I/O.
> 
> When i've seen it crash most readily has been when i'm running a few other 
> domains and then start running dom0 out of memory, but nothing conclusive yet.
> 
> I'll let this test keep running for another hour (otherwise idle, no other 
> domains running) or so then start my running-out-of-memory program.
> 
> I wonder if it is coincidence that we both have smp boxes... each of the 
> domains only sees 1 cpu so I wouldn't have thought that would be a problem 
> unless there's a race in xen itself.
> 
> James
> 
> 
> From: Derek Glidden
> Sent: Mon 19/07/2004 3:22 PM
> To: xen-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: [Xen-devel] segfault in VM
> 
> 
> Maybe related or maybe not, but it was the same VM getting all the 
> scheduling time in my previous post.  (SMP Celeron box with 512M of 
> RAM, no himem enabled.)
> 
> At the time, four VMs were all compiling, with dom0 copying a linux 
> source tree from one place to another with rsync.  Everything copacetic 
> until I started the big rsync in dom0, where within a minute or so, vm2 
> bombed.  No messages on the dom0 console or in the VM other than the 
> "Segmentation Fault" in the VM during compliation.
> 
> However XEN (compiled with debug=y) console spits out:
> 
> (XEN) (file=x86_32/emulate.c, line=228) Bailing: not a -ve offset into 
> 4GB segment.
> 
> at the time of the segmentation fault.
> 
> (and there are lots of these, pretty much any time there is heavy i/o 
> on the machine, all with the same values:)
> 
> (XEN) (file=traps.c, line=466) GPF (0004): fc5277a8 -> fc52a294
> 
> Any further activity inside vm2 results in more segmentation faults and 
> more "Bailing" messages.  The other VMs and dom0 seem to be ok.
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> "We all enter this world in the    | Support Electronic Freedom
> same way: naked; screaming; soaked |        http://www.eff.org/
> in blood. But if you live your     |  http://www.anti-dmca.org/
> life right, that kind of thing     |---------------------------
> doesn't have to stop there." -- Dana Gould
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/xen-devel
 -=- MIME -=- 
--_DA10D165-B49A-46A6-8E62-3E81282C36E8_
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
        charset="iso-8859-1";
        format=flowed

that sounds like the same sort of errors i'm getting which appeared to be f=
ilesystem corruption. First the corruption starts, then everything you do c=
auses a segfault, although i've only seen funny things happen in dom0.

In the limited testing i've done it looks like dom0 by itself is stable, bu=
t crashes start occuring once I start up other domains and work dom0 hard (=
other domains running under light load). I'm running this script in dom0:

#!/bin/sh
while [ 1 =3D 1 ]
do
 diff file3 file4 && echo okay
done

where file3 and file4 are around 300mb files, and the vm has 128mb of memor=
y with no swap. This ensures that none of the file is cached so there's lot=
s of I/O.

When i've seen it crash most readily has been when i'm running a few other =
domains and then start running dom0 out of memory, but nothing conclusive y=
et.

I'll let this test keep running for another hour (otherwise idle, no other =
domains running) or so then start my running-out-of-memory program.

I wonder if it is coincidence that we both have smp boxes... each of the do=
mains only sees 1 cpu so I wouldn't have thought that would be a problem un=
less there's a race in xen itself.

James









From: Derek Glidden
Sent: Mon 19/07/2004 3:22 PM
To: xen-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: [Xen-devel] segfault in VM


Maybe related or maybe not, but it was the same VM getting all the=20
scheduling time in my previous post.  (SMP Celeron box with 512M of=20
RAM, no himem enabled.)

At the time, four VMs were all compiling, with dom0 copying a linux=20
source tree from one place to another with rsync.  Everything copacetic=20
until I started the big rsync in dom0, where within a minute or so, vm2=20
bombed.  No messages on the dom0 console or in the VM other than the=20
"Segmentation Fault" in the VM during compliation.

However XEN (compiled with debug=3Dy) console spits out:

(XEN) (file=3Dx86_32/emulate.c, line=3D228) Bailing: not a -ve offset into=
=20
4GB segment.

at the time of the segmentation fault.

(and there are lots of these, pretty much any time there is heavy i/o=20
on the machine, all with the same values:)

(XEN) (file=3Dtraps.c, line=3D466) GPF (0004): fc5277a8 -> fc52a294

Any further activity inside vm2 results in more segmentation faults and=20
more "Bailing" messages.  The other VMs and dom0 seem to be ok.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=
=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-
"We all enter this world in the    | Support Electronic Freedom
same way: naked; screaming; soaked |        http://www.eff.org/
in blood. But if you live your     |  http://www.anti-dmca.org/
life right, that kind of thing     |---------------------------
doesn't have to stop there." -- Dana Gould



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=3D4721&alloc_id=3D10040&op=3Dclick
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel

--_DA10D165-B49A-46A6-8E62-3E81282C36E8_
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<HTML><HEAD></HEAD>
<BODY>
<DIV id=3DidOWAReplyText53940 dir=3Dltr>
<DIV dir=3Dltr><FONT face=3DArial color=3D#000000 size=3D2>that sounds like=
 the same sort of errors i'm getting which appeared to be filesystem corrup=
tion. First the corruption starts, then everything you do causes a segfault=
, although i've only seen funny things happen in dom0.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>In the limited testing i've done=
 it looks like dom0 by itself is stable, but crashes start occuring once I =
start up other domains and work dom0 hard (other domains running under ligh=
t load). I'm running this script in dom0:</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>#!/bin/sh<BR>while [ 1 =3D 1 ]<B=
R>do<BR>&nbsp;diff file3 file4 &amp;&amp; echo okay<BR>done<BR></FONT></DIV=
>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>where file3 and file4 are around=
 300mb files, and the vm has 128mb of memory with no swap. This ensures tha=
t none of the file is cached so there's lots of I/O.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>When i've seen it crash most rea=
dily has been when i'm running a few other domains and then start running d=
om0 out of memory, but nothing conclusive yet.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>I'll let this test keep running =
for another hour (otherwise idle, no other domains running) or so then star=
t&nbsp;my running-out-of-memory program.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>I wonder if it is coincidence th=
at we both have smp boxes... each of the domains only sees 1 cpu so I would=
n't have thought that would be a problem unless there's a race in xen itsel=
f.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT><FONT face=3DArial size=
=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>James</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr>&nbsp;</DIV></DIV>
<DIV dir=3Dltr><BR>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> Derek Glidden<BR><B>Sent:</B> Mon=
 19/07/2004 3:22 PM<BR><B>To:</B> xen-devel@xxxxxxxxxxxxxxxxxxxxx<BR><B>Sub=
ject:</B> [Xen-devel] segfault in VM<BR></FONT><BR></DIV>
<DIV><PRE style=3D"WORD-WRAP: break-word">Maybe related or maybe not, but i=
t was the same VM getting all the=20
scheduling time in my previous post.  (SMP Celeron box with 512M of=20
RAM, no himem enabled.)

At the time, four VMs were all compiling, with dom0 copying a linux=20
source tree from one place to another with rsync.  Everything copacetic=20
until I started the big rsync in dom0, where within a minute or so, vm2=20
bombed.  No messages on the dom0 console or in the VM other than the=20
"Segmentation Fault" in the VM during compliation.

However XEN (compiled with debug=3Dy) console spits out:

(XEN) (file=3Dx86_32/emulate.c, line=3D228) Bailing: not a -ve offset into=
=20
4GB segment.

at the time of the segmentation fault.

(and there are lots of these, pretty much any time there is heavy i/o=20
on the machine, all with the same values:)

(XEN) (file=3Dtraps.c, line=3D466) GPF (0004): fc5277a8 -&gt; fc52a294

Any further activity inside vm2 results in more segmentation faults and=20
more "Bailing" messages.  The other VMs and dom0 seem to be ok.

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=
=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-
"We all enter this world in the    | Support Electronic Freedom
same way: naked; screaming; soaked |        http://www.eff.org/
in blood. But if you live your     |  http://www.anti-dmca.org/
life right, that kind of thing     |---------------------------
doesn't have to stop there." -- Dana Gould



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=3D4721&amp;alloc_id=3D10040&amp;op=3Dclick
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel
</PRE></DIV></BODY></HTML>

--_DA10D165-B49A-46A6-8E62-3E81282C36E8_--


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.