[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [win-pv-devel] Problems with xenvbd



Il 01/09/2015 16:41, Paul Durrant ha scritto:
-----Original Message-----
From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx]
Sent: 21 August 2015 14:14
To: RafaÅ WojdyÅa; Paul Durrant; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [win-pv-devel] Problems with xenvbd

Il 21/08/2015 10:12, Fabio Fantoni ha scritto:
Il 21/08/2015 00:03, RafaÅ WojdyÅa ha scritto:
On 2015-08-19 23:25, Paul Durrant wrote:
-----Original Message----- From:
win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:win-pv-devel-
bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Rafal Wojdyla Sent: 18
August 2015 14:33 To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx Subject:
[win-pv-devel] Problems with xenvbd

Hi,

I've been testing the current pvdrivers code in preparation for
creating upstream patches for my xeniface additions and I noticed
than xenvbd seems to be very unstable for me. I'm not sure if it's
a problem with xenvbd itself or my code because it seemed to only
manifest when the full suite of our guest tools was installed along
with xenvbd. In short, most of the time the system crashed with
kernel memory corruption in seemingly random processes shortly
after start. Driver Verifier didn't seem to catch anything. You can
see a log from one such crash in the attachment crash1.txt.

Today I tried to perform some more tests but this time without our
guest tools (only pvdrivers and our shared libraries were
installed). To my surprise now Driver Verifier was crashing the
system every time in xenvbd (see crash2.txt). I don't know why it
didn't catch that previously... If adding some timeout to the
offending wait doesn't break anything I'll try that to see if I can
reproduce the previous memory corruptions.

Those crashes do look odd. I'm on PTO for the next week but I'll have
a look when I get back to the office. I did run verifier on all the
drivers a week or so back (while running vbd plug/unplug tests) but
there have been a couple of changes since then.

Paul

No problem. I attached some more logs. The last one was during system
shutdown, after that the OS failed to boot (probably corrupted
filesystem since the BSOD itself seemed to indicate that). I think every
time there is a BLKIF_RSP_ERROR somewhere but I'm not yet familiar with
Xen PV device interfaces so not sure what that means.

In the meantime I've run more tests on my modified xeniface driver to
make sure it's not contributing to these issues but everything seemed to
be fine there.


I also had a disk corruption on windows 10 pro 64 bit with pv drivers
build of 11 august but I'm not sure that is related to winpv drivers,
on same domU I started testing also snapshot with qcow2 disk overlay.
For this case I don't have useful information because don't try to
boot windows at all but if rehappen I'll try to take other useful
information.
Happen another time but also this I was unable to understand what is
exactly the cause.
On windows reboot all seems was ok and did a clean shutdown but on
reboot seabios don't found bootable disk and qemu log don't show useful
informations.
qemu-img check show errors:
/usr/lib/xen/bin/qemu-img check W10.disk1.cow-sn1
ERROR cluster 143 refcount=1 reference=2
Leaked cluster 1077 refcount=1 reference=0
ERROR cluster 1221 refcount=1 reference=2
Leaked cluster 2703 refcount=1 reference=0
Leaked cluster 5212 refcount=1 reference=0
Leaked cluster 13375 refcount=1 reference=0

2 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.

4 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
27853/819200 = 3.40% allocated, 22.65% fragmented, 0.00% compressed
clusters
Image end offset: 1850736640
I created it with:
/usr/lib/xen/bin/qemu-img create -o
backing_file=W10.disk1.xm,backing_fmt=raw -f qcow2 W10.disk1.cow-sn1
and changed the xl domU configuration:
disk=['/mnt/vm2/W10.disk1.cow-sn1,qcow2,hda,rw',...
Dom0 is with xen 4.6-rc1 and qemu 2.4.0
DomU is windows 10 pro 64 bit with pv drivers build of 11 august

How I can know for sure if it is a winpv or qemu or other problem and
take useful information to report?

Thanks for any reply and sorry for my bad english.
This sounds very much like a lack of synchronization somewhere. I recall seeing 
other problems of this ilk when someone was messing around with O_DIRECT for 
opening images. I wonder if we are missing a flush operation on shutdown.

   Paul

Thanks for reply.
I did a fast search but I not found O_DIRECT grepping in libxl, I found it only in qemu code.
After I tried with patch that seems added setting of it for xen:
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=454ae734f1d9f591345fa78376435a8e74bb4edd
Checking in libxl seems disabled by default and from some old xen post seems that O_DIRECT creates problems.
I should try it enable direct-io-safe in domUs qcow2 disks?
Added also Stefano Stabellini as cc.
@Stefano Stabellini: What is the current know status and result of direct-io-safe? Sorry is the question are stupid by or my english is too bad or many post of latest years are confused and in same cases seems also contradictory about stability/integrity/performance using it or not. In particular seems crash with some kernels but I not understand exactly what versions and/or with which patches.

@Paul Durrant: have you see my other mail when I wrote that based on my latest test with xen 4.6 without udev file windows domUs with new pv driver don't boot and for still boot it correctly I must readd udev file, can this cause unexpected case related to this problem or is different?
http://lists.xen.org/archives/html/win-pv-devel/2015-08/msg00033.html

Thanks for any reply and sorry for my bad english.



_______________________________________________
win-pv-devel mailing list
win-pv-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.