[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [win-pv-devel] Problems with xenvbd
On Fri, 4 Sep 2015, Paul Durrant wrote: > > -----Original Message----- > > From: win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:win-pv-devel- > > bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Paul Durrant > > Sent: 02 September 2015 10:00 > > To: Fabio Fantoni; RafaÅ WojdyÅa; win-pv-devel@xxxxxxxxxxxxxxxxxxxx > > Cc: Stefano Stabellini > > Subject: Re: [win-pv-devel] Problems with xenvbd > > > > > -----Original Message----- > > > From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx] > > > Sent: 02 September 2015 09:54 > > > To: Paul Durrant; RafaÅ WojdyÅa; win-pv-devel@xxxxxxxxxxxxxxxxxxxx > > > Cc: Stefano Stabellini > > > Subject: Re: [win-pv-devel] Problems with xenvbd > > > > > > Il 01/09/2015 16:41, Paul Durrant ha scritto: > > > >> -----Original Message----- > > > >> From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx] > > > >> Sent: 21 August 2015 14:14 > > > >> To: RafaÅ WojdyÅa; Paul Durrant; win-pv-devel@xxxxxxxxxxxxxxxxxxxx > > > >> Subject: Re: [win-pv-devel] Problems with xenvbd > > > >> > > > >> Il 21/08/2015 10:12, Fabio Fantoni ha scritto: > > > >>> Il 21/08/2015 00:03, RafaÅ WojdyÅa ha scritto: > > > >>>> On 2015-08-19 23:25, Paul Durrant wrote: > > > >>>>>> -----Original Message----- From: > > > >>>>>> win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:win-pv-devel- > > > >>>>>> bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Rafal Wojdyla Sent: 18 > > > >>>>>> August 2015 14:33 To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx Subject: > > > >>>>>> [win-pv-devel] Problems with xenvbd > > > >>>>>> > > > >>>>>> Hi, > > > >>>>>> > > > >>>>>> I've been testing the current pvdrivers code in preparation for > > > >>>>>> creating upstream patches for my xeniface additions and I noticed > > > >>>>>> than xenvbd seems to be very unstable for me. I'm not sure if it's > > > >>>>>> a problem with xenvbd itself or my code because it seemed to only > > > >>>>>> manifest when the full suite of our guest tools was installed along > > > >>>>>> with xenvbd. In short, most of the time the system crashed with > > > >>>>>> kernel memory corruption in seemingly random processes shortly > > > >>>>>> after start. Driver Verifier didn't seem to catch anything. You can > > > >>>>>> see a log from one such crash in the attachment crash1.txt. > > > >>>>>> > > > >>>>>> Today I tried to perform some more tests but this time without our > > > >>>>>> guest tools (only pvdrivers and our shared libraries were > > > >>>>>> installed). To my surprise now Driver Verifier was crashing the > > > >>>>>> system every time in xenvbd (see crash2.txt). I don't know why it > > > >>>>>> didn't catch that previously... If adding some timeout to the > > > >>>>>> offending wait doesn't break anything I'll try that to see if I can > > > >>>>>> reproduce the previous memory corruptions. > > > >>>>>> > > > >>>>> Those crashes do look odd. I'm on PTO for the next week but I'll > > have > > > >>>>> a look when I get back to the office. I did run verifier on all the > > > >>>>> drivers a week or so back (while running vbd plug/unplug tests) but > > > >>>>> there have been a couple of changes since then. > > > >>>>> > > > >>>>> Paul > > > >>>>> > > > >>>> No problem. I attached some more logs. The last one was during > > > system > > > >>>> shutdown, after that the OS failed to boot (probably corrupted > > > >>>> filesystem since the BSOD itself seemed to indicate that). I think > > every > > > >>>> time there is a BLKIF_RSP_ERROR somewhere but I'm not yet familiar > > > with > > > >>>> Xen PV device interfaces so not sure what that means. > > > >>>> > > > >>>> In the meantime I've run more tests on my modified xeniface driver > > to > > > >>>> make sure it's not contributing to these issues but everything > > seemed > > > to > > > >>>> be fine there. > > > >>>> > > > >>>> > > > >>> I also had a disk corruption on windows 10 pro 64 bit with pv drivers > > > >>> build of 11 august but I'm not sure that is related to winpv drivers, > > > >>> on same domU I started testing also snapshot with qcow2 disk overlay. > > > >>> For this case I don't have useful information because don't try to > > > >>> boot windows at all but if rehappen I'll try to take other useful > > > >>> information. > > > >> Happen another time but also this I was unable to understand what is > > > >> exactly the cause. > > > >> On windows reboot all seems was ok and did a clean shutdown but on > > > >> reboot seabios don't found bootable disk and qemu log don't show > > useful > > > >> informations. > > > >> qemu-img check show errors: > > > >>> /usr/lib/xen/bin/qemu-img check W10.disk1.cow-sn1 > > > >>> ERROR cluster 143 refcount=1 reference=2 > > > >>> Leaked cluster 1077 refcount=1 reference=0 > > > >>> ERROR cluster 1221 refcount=1 reference=2 > > > >>> Leaked cluster 2703 refcount=1 reference=0 > > > >>> Leaked cluster 5212 refcount=1 reference=0 > > > >>> Leaked cluster 13375 refcount=1 reference=0 > > > >>> > > > >>> 2 errors were found on the image. > > > >>> Data may be corrupted, or further writes to the image may corrupt it. > > > >>> > > > >>> 4 leaked clusters were found on the image. > > > >>> This means waste of disk space, but no harm to data. > > > >>> 27853/819200 = 3.40% allocated, 22.65% fragmented, 0.00% > > compressed > > > >>> clusters > > > >>> Image end offset: 1850736640 > > > >> I created it with: > > > >> /usr/lib/xen/bin/qemu-img create -o > > > >> backing_file=W10.disk1.xm,backing_fmt=raw -f qcow2 W10.disk1.cow- > > > sn1 > > > >> and changed the xl domU configuration: > > > >> disk=['/mnt/vm2/W10.disk1.cow-sn1,qcow2,hda,rw',... > > > >> Dom0 is with xen 4.6-rc1 and qemu 2.4.0 > > > >> DomU is windows 10 pro 64 bit with pv drivers build of 11 august > > > >> > > > >> How I can know for sure if it is a winpv or qemu or other problem and > > > >> take useful information to report? > > > >> > > > >> Thanks for any reply and sorry for my bad english. > > > > This sounds very much like a lack of synchronization somewhere. I recall > > > seeing other problems of this ilk when someone was messing around with > > > O_DIRECT for opening images. I wonder if we are missing a flush operation > > > on shutdown. > > > > > > > > Paul > > > > > > > Thanks for reply. > > > I did a fast search but I not found O_DIRECT grepping in libxl, I found > > > it only in qemu code. > > > After I tried with patch that seems added setting of it for xen: > > > > > http://git.qemu.org/?p=qemu.git;a=commitdiff;h=454ae734f1d9f591345fa78 > > > 376435a8e74bb4edd > > > Checking in libxl seems disabled by default and from some old xen post > > > seems that O_DIRECT creates problems. > > > I should try it enable direct-io-safe in domUs qcow2 disks? > > > Added also Stefano Stabellini as cc. > > > @Stefano Stabellini: What is the current know status and result of > > > direct-io-safe? O_DIRECT should be entirely safe to use, at least on ide and qdisk. I haven't done the analysis on ahci emulation in qemu to know whether that would be true for ahci disks, but that doesn't matter because unplug is not implemented for ahci disks. > > > Sorry is the question are stupid by or my english is too bad or many > > > post of latest years are confused and in same cases seems also > > > contradictory about stability/integrity/performance using it or not. > > > In particular seems crash with some kernels but I not understand exactly > > > what versions and/or with which patches. > > > > > > @Paul Durrant: have you see my other mail when I wrote that based on my > > > latest test with xen 4.6 without udev file windows domUs with new pv > > > driver don't boot and for still boot it correctly I must readd udev > > > file, can this cause unexpected case related to this problem or is > > > different? > > > http://lists.xen.org/archives/html/win-pv-devel/2015-08/msg00033.html > > > > > > > I'm not sure why udev would be an issue here. The problem you have > > appears to be QEMU ignoring the request to unplug emulated disks. I've not > > seen this behaviour on my test box so I'll need to dig some more. > > > > I notice you have 6 IDE channels? Are you using AHCI by any chance? If you > are then it looks like QEMU is not honouring the unplug request... that would > be where the bug is. I'll try to repro myself. Unplug on ahci is actually unimplemented, see hw/i386/xen/xen_platform.c: static void unplug_disks(PCIBus *b, PCIDevice *d, void *o) { /* We have to ignore passthrough devices */ if (pci_get_word(d->config + PCI_CLASS_DEVICE) == PCI_CLASS_STORAGE_IDE && strcmp(d->name, "xen-pci-passthrough") != 0) { pci_piix3_xen_ide_unplug(DEVICE(d)); } } the function specifically only unplugs IDE disks. I am not sure what to do about ahci unplug, given that we don't implement scsi disk unplug either. After all, if the goal is to unplug the disk, why choose a faster emulated protocol? _______________________________________________ win-pv-devel mailing list win-pv-devel@xxxxxxxxxxxxxxxxxxxx http://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |