[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Error in xen-unstable



Hello,

I've been doing some more research, and disabled the execution of
scripts in xenbackendd, so no action is performed to disconnect/remove
devices from xenstore, the trace from xenbackendd shows the following
when shutting down:

read from xen watch: /local/domain/0/backend/vbd/3/51713/state
status of /local/domain/0/backend/vbd/3/51713: 5
read from xen watch: /local/domain/0/backend/vbd/3/51713/state
status of /local/domain/0/backend/vbd/3/51713: 6
read from xen watch: /local/domain/0/backend/vbd/3/51714/state
status of /local/domain/0/backend/vbd/3/51714: 5
read from xen watch: /local/domain/0/backend/vbd/3/51714/state
status of /local/domain/0/backend/vbd/3/51714: 6
read from xen watch: /local/domain/0/backend/console/3/0/online
read from xen watch: /local/domain/0/backend/console/3/0/state
status of /local/domain/0/backend/console/3/0: 5
read from xen watch: /local/domain/0/backend/console/3/0
read from xen watch: /local/domain/0/backend/vbd/3/51714/online
read from xen watch: /local/domain/0/backend/vbd/3/51714/state
status of /local/domain/0/backend/vbd/3/51714: 5
read from xen watch: /local/domain/0/backend/vbd/3/51713/online
read from xen watch: /local/domain/0/backend/vbd/3/51713/state
status of /local/domain/0/backend/vbd/3/51713: 5

Devices seem to be in the close states (6), but then something sets
them again to closing (5). That's where the problems comes from, when
xenbackendd reads a state 6, it closes the device and deletes the
entry from xenstore, but another process is setting the state to 5, so
the device is added again to xenstore and prevents to boot the machine
again (I can always delete the device with xenstore-rm and boot again
with no problem). The output from xenstore-ls (if no scripts are
executed to detach/remove the devices):

/local/domain/0/backend = ""   (r0)
/local/domain/0/backend/vbd = ""   (r0)
/local/domain/0/backend/vbd/3 = ""   (r0)
/local/domain/0/backend/vbd/3/51714 = ""   (n0,r3)
/local/domain/0/backend/vbd/3/51714/domain = "debian"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/frontend =
"/local/domain/3/device/vbd/51714"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/uuid =
"7cad66c9-8cc7-973d-5372-582ee255ec7b"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/bootable = "1"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/dev = "xvda2"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/state = "5"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/params =
"/home/xen/debian/disk.img"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/mode = "w"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/online = "0"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/frontend-id = "3"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/type = "file"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/vnd = "/dev/vnd0d"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/physical-device = "3587"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/hotplug-status = "connected"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/sectors = "20971520"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/info = "0"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/sector-size = "512"   (n0,r3)
/local/domain/0/backend/vbd/3/51714/feature-flush-cache = "1"   (n0,r3)
/local/domain/0/backend/vbd/3/51713 = ""   (n0,r3)
/local/domain/0/backend/vbd/3/51713/domain = "debian"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/frontend =
"/local/domain/3/device/vbd/51713"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/uuid =
"6948ddea-0188-fc03-2e6d-3c321217ec31"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/bootable = "0"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/dev = "xvda1"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/state = "5"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/params =
"/home/xen/debian/swap.img"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/mode = "w"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/online = "0"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/frontend-id = "3"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/type = "file"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/vnd = "/dev/vnd1d"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/physical-device = "3603"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/hotplug-status = "connected"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/sectors = "262144"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/info = "0"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/sector-size = "512"   (n0,r3)
/local/domain/0/backend/vbd/3/51713/feature-flush-cache = "1"   (n0,r3)
/local/domain/0/backend/console = ""   (r0)
/local/domain/0/backend/console/3 = ""   (r0)

Someone has a clue of what process would set the state of the device
to 5 (closing) after setting it to 6 (closed)?

I've applied two patches to libxc, but they are related to grant
tables, so I don't think they have anything to do with this:

http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/sysutils/xentools41/patches/patch-bb?only_with_tag=MAIN
http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/sysutils/xentools41/patches/patch-bc?only_with_tag=MAIN

Thanks and regards, Roger.

2011/7/14 Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>:
> On Thu, 2011-07-14 at 13:08 +0100, Roger Pau Monnà wrote:
>> Hello,
>>
>> Thanks both for the help, I was able to boot PV machines using the old
>> toolstack (xm), xl still show the same odd behaviour. I'm able to boot
>> a PV machine only one time, after I shut it down I have to reboot the
>> system, I think because xenstore is not properly cleaned when the
>> machine is shut down. Here is my xenstore-ls before launching any
>> domu:
>>
>> /tool = "" Â (n0)
>> /tool/xenstored = "" Â (n0)
>> [...]
>> /vm/00000000-0000-0000-0000-000000000000/name = "Domain-0" Â (n0)
>> [...]
>> /vm/00000000-0000-0000-0000-000000000000-1/name = "Domain-0" Â (n0)
>> [...]
>> /vm/00000000-0000-0000-0000-000000000000-2/name = "Domain-0" Â (n0)
>> [...]
>> /vm/00000000-0000-0000-0000-000000000000-17/name = "Domain-0" Â (n0)
>
> The presence of all these entries for dom0 is a bit odd. I doubt it is
> related to your issue but it'd be worth clearing them out, or perhaps
> arranging for your xenstore tdb to be nuked on boot.
>
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51 = "" Â (n0,r2)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51/uuid =
>> "f96bf0e3-19ae-e011-ae46-1803730a9a51" Â (n0,r2)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51/name = "debian" Â (n0,r2)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51/pool_name = "Pool-0" Â (n0,r2)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51/image = "" Â (n0,r2)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51/image/ostype = "linux" Â (n0,r2)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51/image/kernel = "" Â (n0,r2)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51/image/ramdisk = "" Â (n0,r2)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51/image/cmdline = "" Â (n0,r2)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51/start_time = "1310648706.88" Â 
>> (n0,r2)
>
> Seems to be domain 2.
>
>> [...]
>> /vm/00000000-0000-0000-0000-000000000000-18/name = "Domain-0" Â (n0)
>
> Another dom 0.
>
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1 = "" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/on_xend_stop = "ignore" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/pool_name = "Pool-0" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/shadow_memory = "0" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/uuid =
>> "f96bf0e3-19ae-e011-ae46-1803730a9a51" Â (n0,r2)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/on_reboot = "restart" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/image = "(linux (kernel '')
>> (superpages 0) (nomigrate 0) (tsc_mode 0))" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/image/ostype = "linux" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/image/kernel = "" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/image/cmdline = "" Â (n0,r2)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/image/ramdisk = "" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/on_poweroff = "destroy" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/bootloader_args = "" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/on_xend_start = "ignore" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/on_crash = "restart" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/xend = "" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/xend/restart_count = "0" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/vcpus = "20" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/vcpu_avail = "1048575" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/bootloader = "" Â (n0)
>> /vm/f96bf0e3-19ae-e011-ae46-1803730a9a51-1/name = "Domain-Unnamed" Â (n0)
>
> Apparently yet another dom2!
>
>> /vm/ebb38b4b-ad25-ba77-a594-dc0309cad702 = "" Â (n0)
>> /vm/ebb38b4b-ad25-ba77-a594-dc0309cad702/image = "" Â (n0)
>> /vm/ebb38b4b-ad25-ba77-a594-dc0309cad702/image/ostype = "linux" Â (n0)
>> /vm/ebb38b4b-ad25-ba77-a594-dc0309cad702/image/kernel =
>> "/var/run/xend/boot/boot_kernel.YHU2Yq" Â (n0)
>> /vm/ebb38b4b-ad25-ba77-a594-dc0309cad702/image/cmdline =
>> "root=/dev/xvda2 ro root=/dev/xvda2 ro " Â (n0,r3)
>> /vm/ebb38b4b-ad25-ba77-a594-dc0309cad702/image/ramdisk =
>> "/var/run/xend/boot/boot_ramdisk.5_DepX" Â (n0)
>
> A dom3... I think having a good clean out of xenstore on boot would be
> wise (or at least remove some of the woods so we can see the trees!)
>
> [...]
>> /vm/00000000-0000-0000-0000-000000000000-19/name = "Domain-0" Â (n0)
>
> This appears to be the current actual dom0 entry.
>
> [...]
>> /local/domain = "" Â (n0)
>> /local/domain/0 = "" Â (r0)
>> /local/domain/0/vm = "/vm/00000000-0000-0000-0000-000000000000-19" Â (r0)
>> /local/domain/0/device = "" Â (n0)
>> /local/domain/0/control = "" Â (n0)
>> /local/domain/0/control/platform-feature-multiprocessor-suspend = "1" Â (n0)
>> /local/domain/0/error = "" Â (n0)
>> /local/domain/0/memory = "" Â (n0)
>> /local/domain/0/memory/target = "524288" Â (n0)
>> /local/domain/0/guest = "" Â (n0)
>> /local/domain/0/hvmpv = "" Â (n0)
>> /local/domain/0/data = "" Â (n0)
>> /local/domain/0/description = "" Â (r0)
>> /local/domain/0/console = "" Â (r0)
>> /local/domain/0/console/limit = "1048576" Â (r0)
>> /local/domain/0/console/type = "xenconsoled" Â (r0)
>> /local/domain/0/domid = "0" Â (r0)
>> /local/domain/0/cpu = "" Â (r0)
>> /local/domain/0/cpu/0 = "" Â (r0)
>> /local/domain/0/cpu/0/availability = "online" Â (r0)
>> /local/domain/0/name = "Domain-0" Â (r0)
>
> All seems sane enough.
>
>> When I launch the domu I see the following messages in the console:
>>
>> xbd backend: attach device vnd0d (size 20971520) for domain 1
>> xbd backend: attach device vnd1d (size 262144) for domain 1
>> mapping kernel into physical memory
>> about to get started...
>>
>> And xenstore reports the following components:
>> [...]
>
> All seems sane enough.
>
>> The domu works correctly, and when I shut it down, I see the following
>> messages in the console:
>>
>> xbd backend: detach device vnd1d for domain 1
>> xbd backend: detach device vnd0d for domain 1
>> Jul 14 15:32:03 loki xenbackendd[771]: Failed to read 
>> /local/domain/0/backend/vbd/1/51714/state (No such file or directory)
>> xenbus: can't get state for backend/vbd/1/51714 (2)
>> xenbus: can't get state for backend/vbd/1/51713 (2)
>
> These messages are odd, in the light of:
> [...]
>> /local/domain/0/backend/vbd/1/51714/state = "5" Â (r0)
>> [...]
>> /local/domain/0/backend/vbd/1/51713/state = "5" Â (r0)
>
>> Note that devices 51714 and 51713 are still listed in the dom0, and
>> when I try to start the domu again, I don't see anything in the
>> console, and xm create freezes.
>
> I imagine that a simple xenstore-read can read those nodes just fine
> throughout?
>
> If you manually xenstore-rm the backend directories does it unwedge
> itself? (obviously not a solution but might provide an interesting data
> point).
>
>> ÂI'm trying to debug this, but since
>> I'm new to xen, it's kind of complicated. If someone could point me in
>> the right direction I will try to do my best to fix this. Anyway, I
>> will keep on working on this as long as I have time to do so.
>
> FWIW I don't see anything like this on Linux dom0, but that doesn't
> necessarily rule out some sort of generic issue. I'm not all that
> familiar with how NetBSD dom0 is setup (for example I'm not sure what
> role xenbackendd plays)
>
> I think clearing out your xenstored db (/var/lib/xenstored/tdb on Linux,
> not sure about NetBSD) and rebooting would be worthwhile just to rule
> out any corruption or whatever. Note that on Linux we remove this in the
> initscript (apparently since 21552:de101fc39fc3 when the xencommons
> stuff landed), perhaps (most probably) doing the same on NetBSD makes
> sense?
>
> One other thing which might be useful to try would be to switch to the
> ocaml xenstored because it will log various xenstore interactions
> in /var/log/xenstored-access.log. It's a bit of a long shot but might
> help spot what's going on (or it might magically fix the problem :-D).
>
> It might also be worth checking the patches in NetBSD's ports, in case
> there is an interesting fix in there...
>
> Maybe you posted it before but can you repost your guest config?
>
> You are having a different issue with xl, is that right? xend is not
> exactly maintained these days.
>
> Ian.
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.