Xen project Mailing List

Re: [Xen-users] Strange failures of Xen 4.3.1, PVHVM storage VM, iSCSI and Windows+GPLPV VM combination

To: Kuba <kuba.0000@xxxxx>, <xen-users@xxxxxxxxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Fri, 7 Feb 2014 10:25:53 +0100

Delivery-date: Fri, 07 Feb 2014 09:26:58 +0000

List-id: Xen user discussion <xen-users.lists.xen.org>

On 06/02/14 23:32, Kuba wrote: > W dniu 2014-02-06 09:22, Roger Pau Monné pisze: >> On 05/02/14 22:54, Kuba wrote: >>> W dniu 2014-02-05 17:54, Roger Pau Monné pisze: >>>> On 05/02/14 17:43, Kuba wrote: >>>>> W dniu 2014-02-05 17:29, Roger Pau Monné pisze: >>>>>> On 05/02/14 17:13, Kuba wrote: >>>>>>> W dniu 2014-02-01 20:27, Kuba pisze: >>>>>>>> W dniu 2014-01-31 02:35, James Harper pisze: >>>>>>>>>> >>>>>>>>>> I am trying to set up a following configuration: >>>>>>>>>> 1. very simple Linux-based dom0 (Debian 7.3) with Xen 4.3.1 >>>>>>>>>> compiled >>>>>>>>>> from sources, >>>>>>>>>> 2. one storage VM (FreeBSD 10, HVM+PV) with SATA controller >>>>>>>>>> attached >>>>>>>>>> using VT-d, exporting block devices via iSCSI to other VMs and >>>>>>>>>> physical >>>>>>>>>> machines, >>>>>>>>>> 3. one Windows 7 SP1 64 VM (HVM+GPLPV) with GPU passthrough >>>>>>>>>> (Quadro >>>>>>>>>> 4000) installed on a block device exported from the storage VM >>>>>>>>>> (target >>>>>>>>>> on the storage VM, initiator on dom0). >>>>>>>>>> >>>>>>>>>> Everything works perfectly (including PCI & GPU passthrough) >>>>>>>>>> until I >>>>>>>>>> install GPLPV drivers on the Windows VM. After driver >>>>>>>>>> installation, >>>>>>>>>> Windows needs to reboot, boots fine, displays a message that PV >>>>>>>>>> SCSI >>>>>>>>> >>>>>>>>> (a) >>>>>>>>> >>>>>>>>>> drivers were installed and needs to reboot again, and then cannot >>>>>>>>>> boot. >>>>>>>>>> Sometimes it gets stuck at "booting from harddrive" in SeaBIOS, >>>>>>>>>> sometimes BSODs with "unmountable boot volume" message. All of >>>>>>>>>> the >>>>>>>>>> following I tried without GPU passthrough to narrow down the >>>>>>>>>> problem. >>>>>>>>>> >>>>>>>>>> The intriguing part is this: >>>>>>>>>> >>>>>>>>>> 1. If the storage VM's OS is Linux - it fails with the above >>>>>>>>>> symptoms. >>>>>>>>>> 2. If the block devices for the storage VM come directly from >>>>>>>>>> dom0 >>>>>>>>>> (not >>>>>>>>>> via pci-passthrough) - it fails. >>>>>>>>>> 2. If the storage VM is an HVM without PV drivers (e.g. FreeBSD >>>>>>>>>> 9.2-GENERIC) - it all works. >>>>>>>>>> 3. If the storage VM's OS is Linux with kernel compiled >>>>>>>>>> without Xen >>>>>>>>>> guest support - it works, but is unstable (see below). >>>>>>>>>> 4. If the iSCSI target is on a different physical machine - it >>>>>>>>>> all >>>>>>>>>> works. >>>>>>>>>> 5. If the iSCSI target is on dom0 itself - it works. >>>>>>>>>> 6. If I attach the AHCI controller to the Windows VM and install >>>>>>>>>> directly on the hard drive - it works. >>>>>>>>>> 7. If the block device for Windows VM is a disk, partition, >>>>>>>>>> file, LVM >>>>>>>>>> volume or even a ZoL's zvol (and it comes from a dom0 itself, >>>>>>>>>> without >>>>>>>>>> iSCSI)- it works. >>>>>>>>>> >>>>>>>>>> If I install Windows and the GPLPV drivers on a hard drive >>>>>>>>>> attached to >>>>>>>>>> dom0, Windows + GPLPV work perfectly. If I then give the same >>>>>>>>>> hard >>>>>>>>>> drive >>>>>>>>>> as a block device to the storage VM and re-export it through >>>>>>>>>> iSCSI, >>>>>>>>> >>>>>>>>> (b) >>>>>>>>> >>>>>>>>>> Windows usually boots fine, but works unstable. And by unstable I >>>>>>>>>> mean >>>>>>>>>> random read/write errors, sometimes programs won't start, >>>>>>>>>> ntdll.dll >>>>>>>>>> crashes, and after couple reboots Windows won't boot (just like >>>>>>>>>> mentioned above). >>>>>>>>>> >>>>>>>>>> The configurations I would like to achieve makes sense only >>>>>>>>>> with PV >>>>>>>>>> drivers on both storage and Windows VM. All of the "components" >>>>>>>>>> seem to >>>>>>>>>> work perfectly until all put together, so I am not really sure >>>>>>>>>> where >>>>>>>>>> the >>>>>>>>>> problem is. >>>>>>>>>> >>>>>>>>>> I would be very grateful for any suggestions or ideas that could >>>>>>>>>> possibly help to narrow down the problem. Maybe I am just doing >>>>>>>>>> something wrong (I hope so). Or maybe there is a bug that shows >>>>>>>>>> itself >>>>>>>>>> only in such a particular configuration (hope not)? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I'm curious about prompting for the pvscsi drivers to be >>>>>>>>> installed. Is >>>>>>>>> this definitely what it is asking for? Pvscsi for gplpv is >>>>>>>>> removed in >>>>>>>>> the latest versions and suffered varying degrees of bitrot in >>>>>>>>> earlier >>>>>>>>> versions. If you have the iscsi initiator in dom0 then exporting a >>>>>>>>> block device to windows via the normal vbd channel should be just >>>>>>>>> fine. >>>>>>>>> >>>>>>>>> You've gone to great lengths to explain the various things you've >>>>>>>>> tried, but I think I'm a little confused on where the iscsi >>>>>>>>> initiator >>>>>>>>> is in the "doesn't work" scenarios. I'm having a bit of an off day >>>>>>>>> today so it's probably just me, but above I have highlighted >>>>>>>>> the two >>>>>>>>> scenarios... could you fill me in on a few things: >>>>>>>>> >>>>>>>>> At (a) and (b), is the iscsi initiator in dom0, or are you >>>>>>>>> actually >>>>>>>>> booting windows directly via iscsi? >>>>>>>>> >>>>>>>>> At (b), with latest debug build of gplpv, can you run debugview >>>>>>>>> from >>>>>>>>> sysinternals.com and see if any interesting messages are displayed >>>>>>>>> before things fall in a heap? >>>>>>>>> >>>>>>>>> Are any strange logs shown in any of Win DomU, Dom0, or storage >>>>>>>>> DomU? >>>>>>>>> >>>>>>>>> How big are your disks? >>>>>>>>> >>>>>>>>> Can you reproduce with only one vcpu? >>>>>>>>> >>>>>>>>> What bridge are you using? Openvswitch or traditional linux >>>>>>>>> bridge? >>>>>>>>> >>>>>>>>> What MTU are you using on your storage network? If you are using >>>>>>>>> Jumbo >>>>>>>>> frames can you go back to 1500 (or at least <= 4000)? >>>>>>>>> >>>>>>>>> Can you turn off scatter gather, Large Send Offload (GSO), and IP >>>>>>>>> Checksum offload on all the iscsi endpoints? >>>>>>>>> >>>>>>>>> Can you turn on data digest/checksum on iscsi? If all endpoints >>>>>>>>> support it then this would provide additional verification that >>>>>>>>> none >>>>>>>>> of the network packets are getting corrupted. >>>>>>>>> >>>>>>>>> Would driver domain work in your scenario? Then the disk could be >>>>>>>>> attached directly from your storage DomU without accruing all the >>>>>>>>> iscsi overhead. I'm not up with the status of HVM, vbd, and driver >>>>>>>>> domain so I don't know if this is possible. >>>>>>>>> >>>>>>>>> More questions than answers. Sorry :) >>>>>>>>> >>>>>>>>> James >>>>>>>> >>>>>>>> Dear James, >>>>>>>> >>>>>>>> thank you for your questions - I really appreciate everything that >>>>>>>> may >>>>>>>> help me move closer to solving or isolating the problem. >>>>>>>> >>>>>>>> I'll check what type of driver is used exactly - up until now I >>>>>>>> always >>>>>>>> just installed all drivers included in the package, I thought >>>>>>>> all of >>>>>>>> them were necessary. I'll try installing them without XenScsi. >>>>>>>> >>>>>>>> Do you mean revisions > 1092:85b99b9795a6 by "the latest versions"? >>>>>>>> Which version should I use? >>>>>>>> >>>>>>>> Forgive me if the descriptions were unclear. The initiator was >>>>>>>> always in >>>>>>>> dom0. I only moved the target to dom0 or a separate physical >>>>>>>> machine in >>>>>>>> (4) and (5). I didn't boot Windows directly from iSCSI (in fact I >>>>>>>> tried >>>>>>>> couple times, but had some problems with it, so I didn't mention >>>>>>>> it). >>>>>>>> >>>>>>>> My "disks" (the block devices I dedicated to the Windows VM) were >>>>>>>> whole >>>>>>>> 120GB and 240GB SSDs, ~100GB ZVOLs and 50GB LVM volumes. >>>>>>>> >>>>>>>> I'm using traditional linux bridge. I didn't set MTUs explicitly, >>>>>>>> so I >>>>>>>> assume it's 1500, but I will verify this. >>>>>>>> >>>>>>>> I'd love to use a storage driver domain, but the wiki says "It >>>>>>>> is not >>>>>>>> possible to use driver domains with pygrub or HVM guests yet". But >>>>>>>> the >>>>>>>> page is a couple of months old, maybe it's an outdated info? It >>>>>>>> surely >>>>>>>> is worth checking out. >>>>>>>> >>>>>>>> I'll do my best to provide answers to the remaining questions as >>>>>>>> soon as >>>>>>>> possible. Thank you for so many ideas. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Kuba >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Xen-users mailing list >>>>>>>> Xen-users@xxxxxxxxxxxxx >>>>>>>> http://lists.xen.org/xen-users >>>>>>> >>>>>>> It seems the problems are not related to GPLPV. There is an easy >>>>>>> way to >>>>>>> reproduce the issues without Windows and without installing >>>>>>> anything, >>>>>>> using only livecds for two DomUs: >>>>>>> >>>>>>> 1) Set up a Linux Dom0 with Xen 4.3.1 and standard Linux bridge for >>>>>>> Dom0 >>>>>>> and DomUs >>>>>> >>>>>> Are you using a Xen build with debugging enabled? I think I might >>>>>> have a >>>>>> clue of what's happening, because I also saw it. Could you recompile >>>>>> Xen >>>>>> with debugging enabled and try the same test (iSCSI target on DomU >>>>>> and >>>>>> initiator on Dom0)? >>>>>> >>>>>> Roger. >>>>>> >>>>>> _______________________________________________ >>>>>> Xen-users mailing list >>>>>> Xen-users@xxxxxxxxxxxxx >>>>>> http://lists.xen.org/xen-users >>>>>> >>>>> >>>>> Of course I could! Please point me to any relevant information on >>>>> how to >>>>> build Xen with debugging enabled and what to do next. I build Xen >>>>> using >>>>> standard ./configure && make world && make install. >>>> >>>> Just `make debug=y xen` and boot with the resulting xen.gz. >>>> >>>> Roger. >>>> >>>> >>>> _______________________________________________ >>>> Xen-users mailing list >>>> Xen-users@xxxxxxxxxxxxx >>>> http://lists.xen.org/xen-users >>>> >>> >>> I ran the test using debug build of Xen. This time I gave the name "tgt" >>> to the DomU with iSCSI target, and the other domain was named simply >>> "domu". Sorry for the inconsistency. After logging in to the iSCSI >>> target from Dom0, I ran "mkfs.ext4 /dev/sdb" (still in Dom0). So far, so >>> good. Then I launched the other DomU and as soon as I executed >>> "fsck.ext4 /dev/xvda", some errors appeared in the output of "xl dmesg" >>> (attached as "xl-dmesg.log"). Surprisingly, the first fsck succeeded. >>> Unfortunately, executing fsck.ext4 for the second time showed serious >>> file system errors. The fsck commands were the only things I ran that >>> touched /dev/xvda. After shutting down "domu", when I tried to log out >>> from the iSCSI target, an error came up in Dom0's dmesg >>> ("dom0-dmesg.log"). Logs from /var/log/xen/ are also attached. >>> >>> I will happily run next tests - just tell me what can I do :) >> >> Hello, >> >> This is the same problem I've seen when using a similar setup. The root >> of the problem is that blkback maps a grant ref to a memory page in >> Dom0, then this memory page ends up in netback, and when netback tries >> to issue a GNTTABOP_copy using the mfn of this grant mapped page the >> operation fails because Xen detects that the mfn passed doesn't belong >> to the guest. >> >> The only way I can think of solving this is that netback detects that >> the page is not local and somehow we use it's grant ref instead of mfn >> (this means we would need to store the grant ref somewhere in the page). >> >> Roger. > > As this is something far beyond my ability to solve, I couldn't resist > to try something else - running FreeBSD 10 as a storage driver domain. I > was able to provide a block device (zvol) from one FreeBSD DomU directly > to another FreeBSD DomU just like described in the wiki (with Qemu > traditional in the second DomU) and install the OS on it. Unfortunately > the second DomU's bios was unable to detect this "disk" and boot from it. > > But with this command: > xl block-attach Domain-0 > "format=raw,backendtype=phy,backend=fbsd,vdev=xvds,target=/dev/zvol/zroot/vol1" > > > I was able to attach a block device exported from a DomU to Dom0 without > iSCSI and then using it as a disk for a second DomU (with > disk=['phy:/dev/xvds,xvda,w']). Now, if the second DomU had no PV > drivers (e.g. Windows without GPLPV), everything worked fine. But > running an OS with PV drivers (Linux or Windows+GPLPV) in the second > DomU resulted in a very similar errors in xl dmesg like in the > previously attached logs (see the attachment). > > Do I understand correctly that solving the issue you are pointing out > would also allow to use OSes like FreeBSD as storage driver domain for > other PV-enabled DomUs? That would be something! > > And most importantly - is there anything I can do to help? Hello, Thanks for testing this use-case also. I have not debugged it closely, but I think what's happening here is that blkback in Dom0 grant maps a page passed from the DomU, and then the blkfront instance on Dom0 tries to use gnttab_grant_foreign_access_ref on that page and fails miserably (because the page doesn't belong to Dom0). The right way to fix this would be to make blkfront and blkback in the respective guests connect directly instead of using Dom0 as a proxy. Mainly the toolstack in Dom0 needs to know you are trying to attach a disk from a driver domain and DTRT (attach the disk locally to Dom0 for HVM access, but write the PV disk info in xenstore so that the guest connects directly to the blkback in the driver domain instead of Dom0). Are you interested in submitting a patch for libxl to fix this? Roger. _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.