[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [win-pv-devel] Problems with xenvbd



> -----Original Message-----
> From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx]
> Sent: 04 September 2015 12:08
> To: Paul Durrant; RafaÅ WojdyÅa; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Stefano Stabellini
> Subject: Re: [win-pv-devel] Problems with xenvbd
> 
> Il 04/09/2015 11:30, Paul Durrant ha scritto:
> >> -----Original Message-----
> >> From: win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:win-pv-devel-
> >> bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Paul Durrant
> >> Sent: 02 September 2015 10:00
> >> To: Fabio Fantoni; RafaÅ WojdyÅa; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> >> Cc: Stefano Stabellini
> >> Subject: Re: [win-pv-devel] Problems with xenvbd
> >>
> >>> -----Original Message-----
> >>> From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx]
> >>> Sent: 02 September 2015 09:54
> >>> To: Paul Durrant; RafaÅ WojdyÅa; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> >>> Cc: Stefano Stabellini
> >>> Subject: Re: [win-pv-devel] Problems with xenvbd
> >>>
> >>> Il 01/09/2015 16:41, Paul Durrant ha scritto:
> >>>>> -----Original Message-----
> >>>>> From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx]
> >>>>> Sent: 21 August 2015 14:14
> >>>>> To: RafaÅ WojdyÅa; Paul Durrant; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> >>>>> Subject: Re: [win-pv-devel] Problems with xenvbd
> >>>>>
> >>>>> Il 21/08/2015 10:12, Fabio Fantoni ha scritto:
> >>>>>> Il 21/08/2015 00:03, RafaÅ WojdyÅa ha scritto:
> >>>>>>> On 2015-08-19 23:25, Paul Durrant wrote:
> >>>>>>>>> -----Original Message----- From:
> >>>>>>>>> win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:win-pv-
> devel-
> >>>>>>>>> bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Rafal Wojdyla Sent:
> 18
> >>>>>>>>> August 2015 14:33 To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> Subject:
> >>>>>>>>> [win-pv-devel] Problems with xenvbd
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I've been testing the current pvdrivers code in preparation for
> >>>>>>>>> creating upstream patches for my xeniface additions and I
> noticed
> >>>>>>>>> than xenvbd seems to be very unstable for me. I'm not sure if
> it's
> >>>>>>>>> a problem with xenvbd itself or my code because it seemed to
> only
> >>>>>>>>> manifest when the full suite of our guest tools was installed
> along
> >>>>>>>>> with xenvbd. In short, most of the time the system crashed with
> >>>>>>>>> kernel memory corruption in seemingly random processes
> shortly
> >>>>>>>>> after start. Driver Verifier didn't seem to catch anything. You can
> >>>>>>>>> see a log from one such crash in the attachment crash1.txt.
> >>>>>>>>>
> >>>>>>>>> Today I tried to perform some more tests but this time without
> our
> >>>>>>>>> guest tools (only pvdrivers and our shared libraries were
> >>>>>>>>> installed). To my surprise now Driver Verifier was crashing the
> >>>>>>>>> system every time in xenvbd (see crash2.txt). I don't know why
> it
> >>>>>>>>> didn't catch that previously... If adding some timeout to the
> >>>>>>>>> offending wait doesn't break anything I'll try that to see if I can
> >>>>>>>>> reproduce the previous memory corruptions.
> >>>>>>>>>
> >>>>>>>> Those crashes do look odd. I'm on PTO for the next week but I'll
> >> have
> >>>>>>>> a look when I get back to the office. I did run verifier on all the
> >>>>>>>> drivers a week or so back (while running vbd plug/unplug tests)
> but
> >>>>>>>> there have been a couple of changes since then.
> >>>>>>>>
> >>>>>>>> Paul
> >>>>>>>>
> >>>>>>> No problem. I attached some more logs. The last one was during
> >>> system
> >>>>>>> shutdown, after that the OS failed to boot (probably corrupted
> >>>>>>> filesystem since the BSOD itself seemed to indicate that). I think
> >> every
> >>>>>>> time there is a BLKIF_RSP_ERROR somewhere but I'm not yet
> familiar
> >>> with
> >>>>>>> Xen PV device interfaces so not sure what that means.
> >>>>>>>
> >>>>>>> In the meantime I've run more tests on my modified xeniface
> driver
> >> to
> >>>>>>> make sure it's not contributing to these issues but everything
> >> seemed
> >>> to
> >>>>>>> be fine there.
> >>>>>>>
> >>>>>>>
> >>>>>> I also had a disk corruption on windows 10 pro 64 bit with pv drivers
> >>>>>> build of 11 august but I'm not sure that is related to winpv drivers,
> >>>>>> on same domU I started testing also snapshot with qcow2 disk
> overlay.
> >>>>>> For this case I don't have useful information because don't try to
> >>>>>> boot windows at all but if rehappen I'll try to take other useful
> >>>>>> information.
> >>>>> Happen another time but also this I was unable to understand what is
> >>>>> exactly the cause.
> >>>>> On windows reboot all seems was ok and did a clean shutdown but on
> >>>>> reboot seabios don't found bootable disk and qemu log don't show
> >> useful
> >>>>> informations.
> >>>>> qemu-img check show errors:
> >>>>>> /usr/lib/xen/bin/qemu-img check W10.disk1.cow-sn1
> >>>>>> ERROR cluster 143 refcount=1 reference=2
> >>>>>> Leaked cluster 1077 refcount=1 reference=0
> >>>>>> ERROR cluster 1221 refcount=1 reference=2
> >>>>>> Leaked cluster 2703 refcount=1 reference=0
> >>>>>> Leaked cluster 5212 refcount=1 reference=0
> >>>>>> Leaked cluster 13375 refcount=1 reference=0
> >>>>>>
> >>>>>> 2 errors were found on the image.
> >>>>>> Data may be corrupted, or further writes to the image may corrupt
> it.
> >>>>>>
> >>>>>> 4 leaked clusters were found on the image.
> >>>>>> This means waste of disk space, but no harm to data.
> >>>>>> 27853/819200 = 3.40% allocated, 22.65% fragmented, 0.00%
> >> compressed
> >>>>>> clusters
> >>>>>> Image end offset: 1850736640
> >>>>> I created it with:
> >>>>> /usr/lib/xen/bin/qemu-img create -o
> >>>>> backing_file=W10.disk1.xm,backing_fmt=raw -f qcow2
> W10.disk1.cow-
> >>> sn1
> >>>>> and changed the xl domU configuration:
> >>>>> disk=['/mnt/vm2/W10.disk1.cow-sn1,qcow2,hda,rw',...
> >>>>> Dom0 is with xen 4.6-rc1 and qemu 2.4.0
> >>>>> DomU is windows 10 pro 64 bit with pv drivers build of 11 august
> >>>>>
> >>>>> How I can know for sure if it is a winpv or qemu or other problem and
> >>>>> take useful information to report?
> >>>>>
> >>>>> Thanks for any reply and sorry for my bad english.
> >>>> This sounds very much like a lack of synchronization somewhere. I
> recall
> >>> seeing other problems of this ilk when someone was messing around
> with
> >>> O_DIRECT for opening images. I wonder if we are missing a flush
> operation
> >>> on shutdown.
> >>>>     Paul
> >>>>
> >>> Thanks for reply.
> >>> I did a fast search but I not found O_DIRECT grepping in libxl, I found
> >>> it only in qemu code.
> >>> After I tried with patch that seems added setting of it for xen:
> >>>
> >>
> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=454ae734f1d9f591345fa78
> >>> 376435a8e74bb4edd
> >>> Checking in libxl seems disabled by default and from some old xen post
> >>> seems that O_DIRECT creates problems.
> >>> I should try it enable direct-io-safe in domUs qcow2 disks?
> >>> Added also Stefano Stabellini as cc.
> >>> @Stefano Stabellini: What is the current know status and result of
> >>> direct-io-safe?
> >>> Sorry is the question are stupid by or my english is too bad or many
> >>> post of latest years are confused and in same cases seems also
> >>> contradictory about stability/integrity/performance using it or not.
> >>> In particular seems crash with some kernels but I not understand exactly
> >>> what versions and/or with which patches.
> >>>
> >>> @Paul Durrant: have you see my other mail when I wrote that based on
> my
> >>> latest test with xen 4.6 without udev file windows domUs with new pv
> >>> driver don't boot and for still boot it correctly I must readd udev
> >>> file, can this cause unexpected case related to this problem or is
> >>> different?
> >>> http://lists.xen.org/archives/html/win-pv-devel/2015-
> 08/msg00033.html
> >>>
> >> I'm not sure why udev would be an issue here. The problem you have
> >> appears to be QEMU ignoring the request to unplug emulated disks. I've
> not
> >> seen this behaviour on my test box so I'll need to dig some more.
> >>
> > I notice you have 6 IDE channels? Are you using AHCI by any chance? If you
> are then it looks like QEMU is not honouring the unplug request... that would
> be where the bug is. I'll try to repro myself.
> >
> >    Paul
> 
> If I remember good I already tried also with ide about both problems
> (udev and qcow) with same result.
> I'm also already using mainly ahci on windows domUs (with new pv) in
> test system for some months.
> But if needed tell me and I'll do more tests.
> About your recent patches seems fix related to unplug or I'm wrong? I'll
> retry with them this afternoon without udev file if new pv test build
> will be ready.

My recent changes to xenvbd were to do with when unplug should be requested and 
also cleaning up on driver removal. I don't think either of them affect your 
case; I think you're experiencing a problem with QEMU.

  Paul

> 
> Thanks for any reply and sorry for my bad english.

_______________________________________________
win-pv-devel mailing list
win-pv-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.