[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [win-pv-devel] Problems with xenvbd



On Fri, 4 Sep 2015, Paul Durrant wrote:
> > -----Original Message-----
> > From: win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:win-pv-devel-
> > bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Paul Durrant
> > Sent: 02 September 2015 10:00
> > To: Fabio Fantoni; RafaÅ WojdyÅa; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> > Cc: Stefano Stabellini
> > Subject: Re: [win-pv-devel] Problems with xenvbd
> >
> > > -----Original Message-----
> > > From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx]
> > > Sent: 02 September 2015 09:54
> > > To: Paul Durrant; RafaÅ WojdyÅa; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> > > Cc: Stefano Stabellini
> > > Subject: Re: [win-pv-devel] Problems with xenvbd
> > >
> > > Il 01/09/2015 16:41, Paul Durrant ha scritto:
> > > >> -----Original Message-----
> > > >> From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx]
> > > >> Sent: 21 August 2015 14:14
> > > >> To: RafaÅ WojdyÅa; Paul Durrant; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> > > >> Subject: Re: [win-pv-devel] Problems with xenvbd
> > > >>
> > > >> Il 21/08/2015 10:12, Fabio Fantoni ha scritto:
> > > >>> Il 21/08/2015 00:03, RafaÅ WojdyÅa ha scritto:
> > > >>>> On 2015-08-19 23:25, Paul Durrant wrote:
> > > >>>>>> -----Original Message----- From:
> > > >>>>>> win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:win-pv-devel-
> > > >>>>>> bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Rafal Wojdyla Sent: 18
> > > >>>>>> August 2015 14:33 To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx Subject:
> > > >>>>>> [win-pv-devel] Problems with xenvbd
> > > >>>>>>
> > > >>>>>> Hi,
> > > >>>>>>
> > > >>>>>> I've been testing the current pvdrivers code in preparation for
> > > >>>>>> creating upstream patches for my xeniface additions and I noticed
> > > >>>>>> than xenvbd seems to be very unstable for me. I'm not sure if it's
> > > >>>>>> a problem with xenvbd itself or my code because it seemed to only
> > > >>>>>> manifest when the full suite of our guest tools was installed along
> > > >>>>>> with xenvbd. In short, most of the time the system crashed with
> > > >>>>>> kernel memory corruption in seemingly random processes shortly
> > > >>>>>> after start. Driver Verifier didn't seem to catch anything. You can
> > > >>>>>> see a log from one such crash in the attachment crash1.txt.
> > > >>>>>>
> > > >>>>>> Today I tried to perform some more tests but this time without our
> > > >>>>>> guest tools (only pvdrivers and our shared libraries were
> > > >>>>>> installed). To my surprise now Driver Verifier was crashing the
> > > >>>>>> system every time in xenvbd (see crash2.txt). I don't know why it
> > > >>>>>> didn't catch that previously... If adding some timeout to the
> > > >>>>>> offending wait doesn't break anything I'll try that to see if I can
> > > >>>>>> reproduce the previous memory corruptions.
> > > >>>>>>
> > > >>>>> Those crashes do look odd. I'm on PTO for the next week but I'll
> > have
> > > >>>>> a look when I get back to the office. I did run verifier on all the
> > > >>>>> drivers a week or so back (while running vbd plug/unplug tests) but
> > > >>>>> there have been a couple of changes since then.
> > > >>>>>
> > > >>>>> Paul
> > > >>>>>
> > > >>>> No problem. I attached some more logs. The last one was during
> > > system
> > > >>>> shutdown, after that the OS failed to boot (probably corrupted
> > > >>>> filesystem since the BSOD itself seemed to indicate that). I think
> > every
> > > >>>> time there is a BLKIF_RSP_ERROR somewhere but I'm not yet familiar
> > > with
> > > >>>> Xen PV device interfaces so not sure what that means.
> > > >>>>
> > > >>>> In the meantime I've run more tests on my modified xeniface driver
> > to
> > > >>>> make sure it's not contributing to these issues but everything
> > seemed
> > > to
> > > >>>> be fine there.
> > > >>>>
> > > >>>>
> > > >>> I also had a disk corruption on windows 10 pro 64 bit with pv drivers
> > > >>> build of 11 august but I'm not sure that is related to winpv drivers,
> > > >>> on same domU I started testing also snapshot with qcow2 disk overlay.
> > > >>> For this case I don't have useful information because don't try to
> > > >>> boot windows at all but if rehappen I'll try to take other useful
> > > >>> information.
> > > >> Happen another time but also this I was unable to understand what is
> > > >> exactly the cause.
> > > >> On windows reboot all seems was ok and did a clean shutdown but on
> > > >> reboot seabios don't found bootable disk and qemu log don't show
> > useful
> > > >> informations.
> > > >> qemu-img check show errors:
> > > >>> /usr/lib/xen/bin/qemu-img check W10.disk1.cow-sn1
> > > >>> ERROR cluster 143 refcount=1 reference=2
> > > >>> Leaked cluster 1077 refcount=1 reference=0
> > > >>> ERROR cluster 1221 refcount=1 reference=2
> > > >>> Leaked cluster 2703 refcount=1 reference=0
> > > >>> Leaked cluster 5212 refcount=1 reference=0
> > > >>> Leaked cluster 13375 refcount=1 reference=0
> > > >>>
> > > >>> 2 errors were found on the image.
> > > >>> Data may be corrupted, or further writes to the image may corrupt it.
> > > >>>
> > > >>> 4 leaked clusters were found on the image.
> > > >>> This means waste of disk space, but no harm to data.
> > > >>> 27853/819200 = 3.40% allocated, 22.65% fragmented, 0.00%
> > compressed
> > > >>> clusters
> > > >>> Image end offset: 1850736640
> > > >> I created it with:
> > > >> /usr/lib/xen/bin/qemu-img create -o
> > > >> backing_file=W10.disk1.xm,backing_fmt=raw -f qcow2 W10.disk1.cow-
> > > sn1
> > > >> and changed the xl domU configuration:
> > > >> disk=['/mnt/vm2/W10.disk1.cow-sn1,qcow2,hda,rw',...
> > > >> Dom0 is with xen 4.6-rc1 and qemu 2.4.0
> > > >> DomU is windows 10 pro 64 bit with pv drivers build of 11 august
> > > >>
> > > >> How I can know for sure if it is a winpv or qemu or other problem and
> > > >> take useful information to report?
> > > >>
> > > >> Thanks for any reply and sorry for my bad english.
> > > > This sounds very much like a lack of synchronization somewhere. I recall
> > > seeing other problems of this ilk when someone was messing around with
> > > O_DIRECT for opening images. I wonder if we are missing a flush operation
> > > on shutdown.
> > > >
> > > >    Paul
> > > >
> > > Thanks for reply.
> > > I did a fast search but I not found O_DIRECT grepping in libxl, I found
> > > it only in qemu code.
> > > After I tried with patch that seems added setting of it for xen:
> > >
> > http://git.qemu.org/?p=qemu.git;a=commitdiff;h=454ae734f1d9f591345fa78
> > > 376435a8e74bb4edd
> > > Checking in libxl seems disabled by default and from some old xen post
> > > seems that O_DIRECT creates problems.
> > > I should try it enable direct-io-safe in domUs qcow2 disks?
> > > Added also Stefano Stabellini as cc.
> > > @Stefano Stabellini: What is the current know status and result of
> > > direct-io-safe?

O_DIRECT should be entirely safe to use, at least on ide and qdisk. I
haven't done the analysis on ahci emulation in qemu to know whether that
would be true for ahci disks, but that doesn't matter because unplug is
not implemented for ahci disks.


> > > Sorry is the question are stupid by or my english is too bad or many
> > > post of latest years are confused and in same cases seems also
> > > contradictory about stability/integrity/performance using it or not.
> > > In particular seems crash with some kernels but I not understand exactly
> > > what versions and/or with which patches.
> > >
> > > @Paul Durrant: have you see my other mail when I wrote that based on my
> > > latest test with xen 4.6 without udev file windows domUs with new pv
> > > driver don't boot and for still boot it correctly I must readd udev
> > > file, can this cause unexpected case related to this problem or is
> > > different?
> > > http://lists.xen.org/archives/html/win-pv-devel/2015-08/msg00033.html
> > >
> >
> > I'm not sure why udev would be an issue here. The problem you have
> > appears to be QEMU ignoring the request to unplug emulated disks. I've not
> > seen this behaviour on my test box so I'll need to dig some more.
> >
>
> I notice you have 6 IDE channels? Are you using AHCI by any chance? If you 
> are then it looks like QEMU is not honouring the unplug request... that would 
> be where the bug is. I'll try to repro myself.

Unplug on ahci is actually unimplemented, see hw/i386/xen/xen_platform.c:

static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
{
    /* We have to ignore passthrough devices */
    if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
            PCI_CLASS_STORAGE_IDE
            && strcmp(d->name, "xen-pci-passthrough") != 0) {
        pci_piix3_xen_ide_unplug(DEVICE(d));
    }
}

the function specifically only unplugs IDE disks.
I am not sure what to do about ahci unplug, given that we don't
implement scsi disk unplug either. After all, if the goal is to unplug
the disk, why choose a faster emulated protocol?
_______________________________________________
win-pv-devel mailing list
win-pv-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.