Xen project Mailing List

Re: [win-pv-devel] Problems with xenvbd

To: Fabio Fantoni <fabio.fantoni@xxxxxxx>, RafaÅ WojdyÅa <omeg@xxxxxxxxxxxxxxxxxxxxxx>, "win-pv-devel@xxxxxxxxxxxxxxxxxxxx" <win-pv-devel@xxxxxxxxxxxxxxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Tue, 1 Sep 2015 14:41:41 +0000

Accept-language: en-GB, en-US

Delivery-date: Tue, 01 Sep 2015 14:41:46 +0000

List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>

Thread-index: AQHQ2f2kBHBqHqNqAUuMLqb8opkfOZ4T1qyQgAF72ICAAKpOAIAAVDmAgBGCQbA=

Thread-topic: [win-pv-devel] Problems with xenvbd

> -----Original Message----- > From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx] > Sent: 21 August 2015 14:14 > To: RafaÅ WojdyÅa; Paul Durrant; win-pv-devel@xxxxxxxxxxxxxxxxxxxx > Subject: Re: [win-pv-devel] Problems with xenvbd > > Il 21/08/2015 10:12, Fabio Fantoni ha scritto: > > Il 21/08/2015 00:03, RafaÅ WojdyÅa ha scritto: > >> On 2015-08-19 23:25, Paul Durrant wrote: > >>>> -----Original Message----- From: > >>>> win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:win-pv-devel- > >>>> bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Rafal Wojdyla Sent: 18 > >>>> August 2015 14:33 To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx Subject: > >>>> [win-pv-devel] Problems with xenvbd > >>>> > >>>> Hi, > >>>> > >>>> I've been testing the current pvdrivers code in preparation for > >>>> creating upstream patches for my xeniface additions and I noticed > >>>> than xenvbd seems to be very unstable for me. I'm not sure if it's > >>>> a problem with xenvbd itself or my code because it seemed to only > >>>> manifest when the full suite of our guest tools was installed along > >>>> with xenvbd. In short, most of the time the system crashed with > >>>> kernel memory corruption in seemingly random processes shortly > >>>> after start. Driver Verifier didn't seem to catch anything. You can > >>>> see a log from one such crash in the attachment crash1.txt. > >>>> > >>>> Today I tried to perform some more tests but this time without our > >>>> guest tools (only pvdrivers and our shared libraries were > >>>> installed). To my surprise now Driver Verifier was crashing the > >>>> system every time in xenvbd (see crash2.txt). I don't know why it > >>>> didn't catch that previously... If adding some timeout to the > >>>> offending wait doesn't break anything I'll try that to see if I can > >>>> reproduce the previous memory corruptions. > >>>> > >>> Those crashes do look odd. I'm on PTO for the next week but I'll have > >>> a look when I get back to the office. I did run verifier on all the > >>> drivers a week or so back (while running vbd plug/unplug tests) but > >>> there have been a couple of changes since then. > >>> > >>> Paul > >>> > >> No problem. I attached some more logs. The last one was during system > >> shutdown, after that the OS failed to boot (probably corrupted > >> filesystem since the BSOD itself seemed to indicate that). I think every > >> time there is a BLKIF_RSP_ERROR somewhere but I'm not yet familiar with > >> Xen PV device interfaces so not sure what that means. > >> > >> In the meantime I've run more tests on my modified xeniface driver to > >> make sure it's not contributing to these issues but everything seemed to > >> be fine there. > >> > >> > > > > I also had a disk corruption on windows 10 pro 64 bit with pv drivers > > build of 11 august but I'm not sure that is related to winpv drivers, > > on same domU I started testing also snapshot with qcow2 disk overlay. > > For this case I don't have useful information because don't try to > > boot windows at all but if rehappen I'll try to take other useful > > information. > > Happen another time but also this I was unable to understand what is > exactly the cause. > On windows reboot all seems was ok and did a clean shutdown but on > reboot seabios don't found bootable disk and qemu log don't show useful > informations. > qemu-img check show errors: > > /usr/lib/xen/bin/qemu-img check W10.disk1.cow-sn1 > > ERROR cluster 143 refcount=1 reference=2 > > Leaked cluster 1077 refcount=1 reference=0 > > ERROR cluster 1221 refcount=1 reference=2 > > Leaked cluster 2703 refcount=1 reference=0 > > Leaked cluster 5212 refcount=1 reference=0 > > Leaked cluster 13375 refcount=1 reference=0 > > > > 2 errors were found on the image. > > Data may be corrupted, or further writes to the image may corrupt it. > > > > 4 leaked clusters were found on the image. > > This means waste of disk space, but no harm to data. > > 27853/819200 = 3.40% allocated, 22.65% fragmented, 0.00% compressed > > clusters > > Image end offset: 1850736640 > I created it with: > /usr/lib/xen/bin/qemu-img create -o > backing_file=W10.disk1.xm,backing_fmt=raw -f qcow2 W10.disk1.cow-sn1 > and changed the xl domU configuration: > disk=['/mnt/vm2/W10.disk1.cow-sn1,qcow2,hda,rw',... > Dom0 is with xen 4.6-rc1 and qemu 2.4.0 > DomU is windows 10 pro 64 bit with pv drivers build of 11 august > > How I can know for sure if it is a winpv or qemu or other problem and > take useful information to report? > > Thanks for any reply and sorry for my bad english. This sounds very much like a lack of synchronization somewhere. I recall seeing other problems of this ilk when someone was messing around with O_DIRECT for opening images. I wonder if we are missing a flush operation on shutdown. Paul _______________________________________________ win-pv-devel mailing list win-pv-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.