[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [win-pv-devel] Problems with xenvbd
Il 04/09/2015 11:30, Paul Durrant ha scritto: -----Original Message----- From: win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:win-pv-devel- bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Paul Durrant Sent: 02 September 2015 10:00 To: Fabio Fantoni; RafaÅ WojdyÅa; win-pv-devel@xxxxxxxxxxxxxxxxxxxx Cc: Stefano Stabellini Subject: Re: [win-pv-devel] Problems with xenvbd-----Original Message----- From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx] Sent: 02 September 2015 09:54 To: Paul Durrant; RafaÅ WojdyÅa; win-pv-devel@xxxxxxxxxxxxxxxxxxxx Cc: Stefano Stabellini Subject: Re: [win-pv-devel] Problems with xenvbd Il 01/09/2015 16:41, Paul Durrant ha scritto:-----Original Message----- From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx] Sent: 21 August 2015 14:14 To: RafaÅ WojdyÅa; Paul Durrant; win-pv-devel@xxxxxxxxxxxxxxxxxxxx Subject: Re: [win-pv-devel] Problems with xenvbd Il 21/08/2015 10:12, Fabio Fantoni ha scritto:Il 21/08/2015 00:03, RafaÅ WojdyÅa ha scritto:On 2015-08-19 23:25, Paul Durrant wrote:-----Original Message----- From: win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:win-pv-devel- bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Rafal Wojdyla Sent: 18 August 2015 14:33 To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx Subject: [win-pv-devel] Problems with xenvbd Hi, I've been testing the current pvdrivers code in preparation for creating upstream patches for my xeniface additions and I noticed than xenvbd seems to be very unstable for me. I'm not sure if it's a problem with xenvbd itself or my code because it seemed to only manifest when the full suite of our guest tools was installed along with xenvbd. In short, most of the time the system crashed with kernel memory corruption in seemingly random processes shortly after start. Driver Verifier didn't seem to catch anything. You can see a log from one such crash in the attachment crash1.txt. Today I tried to perform some more tests but this time without our guest tools (only pvdrivers and our shared libraries were installed). To my surprise now Driver Verifier was crashing the system every time in xenvbd (see crash2.txt). I don't know why it didn't catch that previously... If adding some timeout to the offending wait doesn't break anything I'll try that to see if I can reproduce the previous memory corruptions.Those crashes do look odd. I'm on PTO for the next week but I'llhavea look when I get back to the office. I did run verifier on all the drivers a week or so back (while running vbd plug/unplug tests) but there have been a couple of changes since then. PaulNo problem. I attached some more logs. The last one was duringsystemshutdown, after that the OS failed to boot (probably corrupted filesystem since the BSOD itself seemed to indicate that). I thinkeverytime there is a BLKIF_RSP_ERROR somewhere but I'm not yet familiarwithXen PV device interfaces so not sure what that means. In the meantime I've run more tests on my modified xeniface drivertomake sure it's not contributing to these issues but everythingseemedtobe fine there.I also had a disk corruption on windows 10 pro 64 bit with pv drivers build of 11 august but I'm not sure that is related to winpv drivers, on same domU I started testing also snapshot with qcow2 disk overlay. For this case I don't have useful information because don't try to boot windows at all but if rehappen I'll try to take other useful information.Happen another time but also this I was unable to understand what is exactly the cause. On windows reboot all seems was ok and did a clean shutdown but on reboot seabios don't found bootable disk and qemu log don't showusefulinformations. qemu-img check show errors:/usr/lib/xen/bin/qemu-img check W10.disk1.cow-sn1 ERROR cluster 143 refcount=1 reference=2 Leaked cluster 1077 refcount=1 reference=0 ERROR cluster 1221 refcount=1 reference=2 Leaked cluster 2703 refcount=1 reference=0 Leaked cluster 5212 refcount=1 reference=0 Leaked cluster 13375 refcount=1 reference=0 2 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 4 leaked clusters were found on the image. This means waste of disk space, but no harm to data. 27853/819200 = 3.40% allocated, 22.65% fragmented, 0.00%compressedclusters Image end offset: 1850736640I created it with: /usr/lib/xen/bin/qemu-img create -o backing_file=W10.disk1.xm,backing_fmt=raw -f qcow2 W10.disk1.cow-sn1and changed the xl domU configuration: disk=['/mnt/vm2/W10.disk1.cow-sn1,qcow2,hda,rw',... Dom0 is with xen 4.6-rc1 and qemu 2.4.0 DomU is windows 10 pro 64 bit with pv drivers build of 11 august How I can know for sure if it is a winpv or qemu or other problem and take useful information to report? Thanks for any reply and sorry for my bad english.This sounds very much like a lack of synchronization somewhere. I recallseeing other problems of this ilk when someone was messing around with O_DIRECT for opening images. I wonder if we are missing a flush operation on shutdown.PaulThanks for reply. I did a fast search but I not found O_DIRECT grepping in libxl, I found it only in qemu code. After I tried with patch that seems added setting of it for xen:http://git.qemu.org/?p=qemu.git;a=commitdiff;h=454ae734f1d9f591345fa78376435a8e74bb4edd Checking in libxl seems disabled by default and from some old xen post seems that O_DIRECT creates problems. I should try it enable direct-io-safe in domUs qcow2 disks? Added also Stefano Stabellini as cc. @Stefano Stabellini: What is the current know status and result of direct-io-safe? Sorry is the question are stupid by or my english is too bad or many post of latest years are confused and in same cases seems also contradictory about stability/integrity/performance using it or not. In particular seems crash with some kernels but I not understand exactly what versions and/or with which patches. @Paul Durrant: have you see my other mail when I wrote that based on my latest test with xen 4.6 without udev file windows domUs with new pv driver don't boot and for still boot it correctly I must readd udev file, can this cause unexpected case related to this problem or is different? http://lists.xen.org/archives/html/win-pv-devel/2015-08/msg00033.htmlI'm not sure why udev would be an issue here. The problem you have appears to be QEMU ignoring the request to unplug emulated disks. I've not seen this behaviour on my test box so I'll need to dig some more.I notice you have 6 IDE channels? Are you using AHCI by any chance? If you are then it looks like QEMU is not honouring the unplug request... that would be where the bug is. I'll try to repro myself. Paul If I remember good I already tried also with ide about both problems (udev and qcow) with same result. I'm also already using mainly ahci on windows domUs (with new pv) in test system for some months. But if needed tell me and I'll do more tests.About your recent patches seems fix related to unplug or I'm wrong? I'll retry with them this afternoon without udev file if new pv test build will be ready. Thanks for any reply and sorry for my bad english. _______________________________________________ win-pv-devel mailing list win-pv-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |