[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Strange issue: DomU not saving when using direct HW access, plus non-working after restart



> I do have a real strange problem here:
>
> My environment: Xen 3.02 on SuSE 10.1
>
> In dom0 I disable eth0 with the following lines in /etc/init.d/boot.local:
> /sbin/modprobe pciback
> /bin/echo -n 0000:01:00.0 > /sys/bus/pci/drivers/e1000/unbind
> /bin/echo -n 0000:01:00.0 > /sys/bus/pci/drivers/pciback/new_slot
> /bin/echo -n 0000:01:00.0 > /sys/bus/pci/drivers/pciback/bind
>
>
> Than I start my domU with the following parameters:
> [...]
> pci = [ '01:00.0' ]
> dhcp = 'dhcp'
> [...]
>
> Basically every thing's fine so far. domU is booting a accessing the net
> via dhcp over it's HW assigned eth0.

Cool.

> But when I reboot dom0 it tries to save domU (which seems to be OK).
> After rebooting dom0 starts to restart domU which fails and results in a
> "cold boot" of domU (incl. file check etc on its boot).
>
> Now if I try to save and restore the domU manually it fails and I get
> this messages:
> Error: pci: Invalid config setting bus: none
>
> Even stranger:
> If I then try to start domU manually with xm create domU -c, dhcp is
> just not working!
> domU finds the assigned HW (eth0) but is not able to set up the network
> at all! And I can't get domU back to work until I reboot the whole
> system (dom0) completely!

Suspend / resume isn't supported for domains that have direct access to PCI 
devices - I'm surprised the tools even allow it (they probably shouldn't!).

It's strange that subsequently starting the domain manually also fails - are 
you sure that the domain you attempted to restore wasn't still hanging around 
somewhere?  If it really is failing when there are no other domains fighting 
for that card, it could be that the state of the ethernet card (or, I guess, 
maybe that of the Xen PCI pciback driver) has been messed up by the failed 
operations and that's why you need a whole machine reboot.

The simple fix is to disable the automatic suspend/resume of that domain on 
reboot; have it shutdown and reboot by dom0 instead.  Other domains that 
don't have direct hardware access may still be safely suspend-resumed.

Something that I'd be interested in is whether once you've got to the wedged 
state of requiring a dom0 reboot, whether you could bring up that ethernet 
device in dom0 (by rebinding it back to the e1000 driver).  This would tell 
us if the device is wedged, vs pciback.  Please note that trying this (or 
starting new driver domains once you've got into the wedged state or doing a 
resume of a saved driver domain either explicitly or at dom0 reboot) is quite 
possibly going to send weird commands to your NIC; I'd not expect this to 
actually harm modern hardware but it's not impossible you could get some 
instability / corruption on the host system (not just the domU).

So, if it's *not* an important / production box containing any useful data, 
I'd be interested if you could experiment a bit more - otherwise just disable 
the automatic suspend/resume on dom0 reboot for that domain and your problem 
will be solved.

Does that answer your question?  It's great to have users / testers of the 
driver domains functionality, so please let us know how you get on!

Cheers,
Mark

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.