[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] segfault in VM



As a first test I have just disabled networking via nics=0 in the config, and running this script in dom1:
#!/bin/sh
while [ 1 = 1 ]
do
  dd if=/dev/sda1 of=/dev/null bs=1024 count=128K &
  dd if=/dev/sda1 of=/dev/null bs=1024 skip=256K count=256K
done
it tells me 'ioctl 801c6d02 not supported by XL blkif' but that doesn't seem to matter. Anyway, there are no crashes so far so i'm thinking at this stage that the block interface stuff is probably fine and I should now concentrate on the network. Disabling the block stuff will be a huge hassle at this stage so i'll have to let it go for the moment.
 
I think i need a crash course in how all this hangs together before I can understand what i'm testing... My understanding is as follows:
 
packets sent to dom0.vif1.0 appear at dom1.eth0.
packets sent to dom1.eth0 appear at dom0.vif1.0.
 
and that's about it. Are they symmetrical? Is the transmit code for dom0.vif1.0 the same as the transmit code for dom1.eth0? Ditto for receive?
 
James
 


From: Keir Fraser
Sent: Thu 22/07/2004 12:03 PM
To: James Harper
Cc: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] segfault in VM

> i'm building this now, and am just thinking about how to test this... I was using a ping as my test mechanism. I guess i'll do lots of block device copies. I guess this lends weight to your thoughts that it probably is a net problem and not a block problem.
> 
> Instead of changing the source code to disable the net stuff, would it work if I just specified 'nics=0' or is some part of the net subsystem still activated? I'll test this too anyway.

I think the source will need to be changed. In any case, it's a
trivial change and then we can be certain that no device channel is
being set up.

> In order to test disabling send or receive, this might be a bit trickier than you first make out. Send-only should be easy enough, just start another domain and then ping it (a manual arp table entry should alleviate the need to broadcast). Receive-only will be tricker. How do you get a domain to send to it? This problem of course assumes that corruption is not limited to the domain... if it is limited to the domain then you should be able to have a send/receive domain and ignore crashes in there, just focus on the crashes in the receive-only domain.

That's the reason for the broadcast ping. Unfortunately I'm not sure
how useful that will turn out to be -- e.g., we may just end up hosing
DOM0. 

> i'm almost confused, but am about to start testing - firstly with no network.

Stage 1 (isolating blkdev and network) shouldn't be too
hard. Basically we're ensuring the data paths in teh backend drivers
do not get executed -- they will only ever execute if there is a
device channel set up to a frontend in another guest, so disabling the
frontend drivers ensures this.

 -- Keir


> James
> 
> 
> From: Keir Fraser
> Sent: Wed 21/07/2004 11:30 PM
> To: James Harper
> Cc: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] segfault in VM
> 
> 
> Could someone try to isolate this to either the network backend driver
> or the blkdev backend driver?
> 
> The best way to do this is to disable the frontend drivers so that
> they never try to coinnect to the backend driver...
> 
> To disable networking:
> Edit arch/xen/drivers/netif/frontend/main.c. Change netif_init() to
> always 'return 0;'.
> 
> To disable block devices:
> Edit arch/xen/drivers/blkif/frontend/main.c. Change xlblk_init() to
> always 'return 0;'.
> 
> Oh yes -- the 2.4 sparse tree no longer contains the net frontend
> driver - you'll find the build tree symlinks to
> linux-2.6.7-xen-sparse/drivers/xen/net/network.c. So you might want to
> edit that instead...
> 
> Obviously, if you disable blkdevs you'll need to boot off a ramdisk
> or via a networked mount. :-)
> 
>  Cheers,
>  Keir
> 
> 
> > I downloaded these (from a tgz that Keir had given me a link to as bk was down - I assume it's identical to his latest fixes) and started my tests running and went to bed, but it looks like I got errors within a very short time.
> > The tests I was running were my 'compare' script and pinging the two domains I had running with
> > ping -q -i 0.01 -s 1400 <ip address>
> > 
> > Lots of oopses in the logs, most are probably as a result of the corruption and not indicative of the cause. They look similar to Jody's dump so I won't bother sending them unless someone thinks they might be useful.
> > 
> > btw, can the install be modified to give us a System.map-2.4.26-xen[0U] in /boot? ksymoops would be much happier.
> > 
> > James
 -=- MIME -=- 
--_6A1C7D2E-1D2E-47A8-818D-57D5389770AA_
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

i'm building this now, and am just thinking about how to test this... I was=
 using a ping as my test mechanism. I guess i'll do lots of block device co=
pies. I guess this lends weight to your thoughts that it probably is a net =
problem and not a block problem.

Instead of changing the source code to disable the net stuff, would it work=
 if I just specified 'nics=3D0' or is some part of the net subsystem still =
activated? I'll test this too anyway.

In order to test disabling send or receive, this might be a bit trickier th=
an you first make out. Send-only should be easy enough, just start another =
domain and then ping it (a manual arp table entry should alleviate the need=
 to broadcast). Receive-only will be tricker. How do you get a domain to se=
nd to it? This problem of course assumes that corruption is not limited to =
the domain... if it is limited to the domain then you should be able to hav=
e a send/receive domain and ignore crashes in there, just focus on the cras=
hes in the receive-only domain.

i'm almost confused, but am about to start testing - firstly with no networ=
k.

James


From: Keir Fraser
Sent: Wed 21/07/2004 11:30 PM
To: James Harper
Cc: Keir Fraser; xen-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] segfault in VM


Could someone try to isolate this to either the network backend driver
or the blkdev backend driver?

The best way to do this is to disable the frontend drivers so that
they never try to coinnect to the backend driver...

To disable networking:
Edit arch/xen/drivers/netif/frontend/main.c. Change netif_init() to
always 'return 0;'.

To disable block devices:
Edit arch/xen/drivers/blkif/frontend/main.c. Change xlblk_init() to
always 'return 0;'.

Oh yes -- the 2.4 sparse tree no longer contains the net frontend
driver - you'll find the build tree symlinks to
linux-2.6.7-xen-sparse/drivers/xen/net/network.c. So you might want to
edit that instead...

Obviously, if you disable blkdevs you'll need to boot off a ramdisk
or via a networked mount. :-)

 Cheers,
 Keir


> I downloaded these (from a tgz that Keir had given me a link to as bk was=
 down - I assume it's identical to his latest fixes) and started my tests r=
unning and went to bed, but it looks like I got errors within a very short =
time.
> The tests I was running were my 'compare' script and pinging the two doma=
ins I had running with
> ping -q -i 0.01 -s 1400 <ip address>
>=20
> Lots of oopses in the logs, most are probably as a result of the corrupti=
on and not indicative of the cause. They look similar to Jody's dump so I w=
on't bother sending them unless someone thinks they might be useful.
>=20
> btw, can the install be modified to give us a System.map-2.4.26-xen[0U] i=
n /boot? ksymoops would be much happier.
>=20
> James

--_6A1C7D2E-1D2E-47A8-818D-57D5389770AA_
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<HTML><HEAD></HEAD>
<BODY>
<DIV id=3DidOWAReplyText8898 dir=3Dltr>
<DIV dir=3Dltr><FONT face=3DArial color=3D#000000 size=3D2>i'm building thi=
s now, and am</FONT><FONT face=3DArial size=3D2> just thinking about how to=
 test this... I was using a ping as my test mechanism. I guess i'll do lots=
 of block device copies. I guess this lends weight to your thoughts that it=
 probably is a net problem and not a block problem.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>Instead of changing the source c=
ode to disable the net stuff, would it work if I just specified 'nics=3D0' =
or is some part of the net subsystem still activated? </FONT><FONT face=3DA=
rial size=3D2>I'll test this too anyway.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>In order to test disabling send =
or receive, this might be a bit trickier than you first make out. Send- be easy enough, just start another domain and then ping it (a manua=
l arp table entry should alleviate the need to broadcast). Receive-only wil=
l be tricker. How do you get a domain to send to it? This problem of course=
 assumes that corruption is not&nbsp;limited to the domain... if it is limi=
ted to the domain then you should be able to have a send/receive domain and=
 ignore crashes in there, just focus on the crashes in the receive-only dom=
ain.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>i'm almost confused, but am abou=
t to start testing - firstly with no network.</FONT></DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV dir=3Dltr><FONT face=3DArial size=3D2>James</FONT></DIV></DIV>
<DIV dir=3Dltr>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> Keir Fraser<BR><B>Sent:</B> Wed 2=
1/07/2004 11:30 PM<BR><B>To:</B> James Harper<BR><B>Cc:</B> Keir Fraser; xe=
n-devel@xxxxxxxxxxxxxxxxxxxxx<BR><B>Subject:</B> Re: [Xen-devel] segfault i=
n VM<BR></FONT><BR></DIV>
<DIV><PRE style=3D"WORD-WRAP: break-word">Could someone try to isolate this=
 to either the network backend driver
or the blkdev backend driver?

The best way to do this is to disable the frontend drivers so that
they never try to coinnect to the backend driver...

To disable networking:
Edit arch/xen/drivers/netif/frontend/main.c. Change netif_init() to
always 'return 0;'.

To disable block devices:
Edit arch/xen/drivers/blkif/frontend/main.c. Change xlblk_init() to
always 'return 0;'.

Oh yes -- the 2.4 sparse tree no longer contains the net frontend
driver - you'll find the build tree symlinks to
linux-2.6.7-xen-sparse/drivers/xen/net/network.c. So you might want to
edit that instead...

Obviously, if you disable blkdevs you'll need to boot off a ramdisk
or via a networked mount. :-)

 Cheers,
 Keir


&gt; I downloaded these (from a tgz that Keir had given me a link to as bk =
was down - I assume it's identical to his latest fixes) and started my test=
s running and went to bed, but it looks like I got errors within a very sho=
rt time.
&gt; The tests I was running were my 'compare' script and pinging the two d=
omains I had running with
&gt; ping -q -i 0.01 -s 1400 &lt;ip address&gt;
&gt;=20
&gt; Lots of oopses in the logs, most are probably as a result of the corru=
ption and not indicative of the cause. They look similar to Jody's dump so =
I won't bother sending them unless someone thinks they might be useful.
&gt;=20
&gt; btw, can the install be modified to give us a System.map-2.4.26-xen[0U=
] in /boot? ksymoops would be much happier.
&gt;=20
&gt; James
</PRE></DIV></BODY></HTML>

--_6A1C7D2E-1D2E-47A8-818D-57D5389770AA_--


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.