[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Strange failures of Xen 4.3.1, PVHVM storage VM, iSCSI and Windows+GPLPV VM combination
On 05/02/14 17:13, Kuba wrote: > W dniu 2014-02-01 20:27, Kuba pisze: >> W dniu 2014-01-31 02:35, James Harper pisze: >>>> >>>> I am trying to set up a following configuration: >>>> 1. very simple Linux-based dom0 (Debian 7.3) with Xen 4.3.1 compiled >>>> from sources, >>>> 2. one storage VM (FreeBSD 10, HVM+PV) with SATA controller attached >>>> using VT-d, exporting block devices via iSCSI to other VMs and physical >>>> machines, >>>> 3. one Windows 7 SP1 64 VM (HVM+GPLPV) with GPU passthrough (Quadro >>>> 4000) installed on a block device exported from the storage VM (target >>>> on the storage VM, initiator on dom0). >>>> >>>> Everything works perfectly (including PCI & GPU passthrough) until I >>>> install GPLPV drivers on the Windows VM. After driver installation, >>>> Windows needs to reboot, boots fine, displays a message that PV SCSI >>> >>> (a) >>> >>>> drivers were installed and needs to reboot again, and then cannot boot. >>>> Sometimes it gets stuck at "booting from harddrive" in SeaBIOS, >>>> sometimes BSODs with "unmountable boot volume" message. All of the >>>> following I tried without GPU passthrough to narrow down the problem. >>>> >>>> The intriguing part is this: >>>> >>>> 1. If the storage VM's OS is Linux - it fails with the above symptoms. >>>> 2. If the block devices for the storage VM come directly from dom0 (not >>>> via pci-passthrough) - it fails. >>>> 2. If the storage VM is an HVM without PV drivers (e.g. FreeBSD >>>> 9.2-GENERIC) - it all works. >>>> 3. If the storage VM's OS is Linux with kernel compiled without Xen >>>> guest support - it works, but is unstable (see below). >>>> 4. If the iSCSI target is on a different physical machine - it all >>>> works. >>>> 5. If the iSCSI target is on dom0 itself - it works. >>>> 6. If I attach the AHCI controller to the Windows VM and install >>>> directly on the hard drive - it works. >>>> 7. If the block device for Windows VM is a disk, partition, file, LVM >>>> volume or even a ZoL's zvol (and it comes from a dom0 itself, without >>>> iSCSI)- it works. >>>> >>>> If I install Windows and the GPLPV drivers on a hard drive attached to >>>> dom0, Windows + GPLPV work perfectly. If I then give the same hard >>>> drive >>>> as a block device to the storage VM and re-export it through iSCSI, >>> >>> (b) >>> >>>> Windows usually boots fine, but works unstable. And by unstable I mean >>>> random read/write errors, sometimes programs won't start, ntdll.dll >>>> crashes, and after couple reboots Windows won't boot (just like >>>> mentioned above). >>>> >>>> The configurations I would like to achieve makes sense only with PV >>>> drivers on both storage and Windows VM. All of the "components" seem to >>>> work perfectly until all put together, so I am not really sure where >>>> the >>>> problem is. >>>> >>>> I would be very grateful for any suggestions or ideas that could >>>> possibly help to narrow down the problem. Maybe I am just doing >>>> something wrong (I hope so). Or maybe there is a bug that shows itself >>>> only in such a particular configuration (hope not)? >>>> >>> >>> I'm curious about prompting for the pvscsi drivers to be installed. Is >>> this definitely what it is asking for? Pvscsi for gplpv is removed in >>> the latest versions and suffered varying degrees of bitrot in earlier >>> versions. If you have the iscsi initiator in dom0 then exporting a >>> block device to windows via the normal vbd channel should be just fine. >>> >>> You've gone to great lengths to explain the various things you've >>> tried, but I think I'm a little confused on where the iscsi initiator >>> is in the "doesn't work" scenarios. I'm having a bit of an off day >>> today so it's probably just me, but above I have highlighted the two >>> scenarios... could you fill me in on a few things: >>> >>> At (a) and (b), is the iscsi initiator in dom0, or are you actually >>> booting windows directly via iscsi? >>> >>> At (b), with latest debug build of gplpv, can you run debugview from >>> sysinternals.com and see if any interesting messages are displayed >>> before things fall in a heap? >>> >>> Are any strange logs shown in any of Win DomU, Dom0, or storage DomU? >>> >>> How big are your disks? >>> >>> Can you reproduce with only one vcpu? >>> >>> What bridge are you using? Openvswitch or traditional linux bridge? >>> >>> What MTU are you using on your storage network? If you are using Jumbo >>> frames can you go back to 1500 (or at least <= 4000)? >>> >>> Can you turn off scatter gather, Large Send Offload (GSO), and IP >>> Checksum offload on all the iscsi endpoints? >>> >>> Can you turn on data digest/checksum on iscsi? If all endpoints >>> support it then this would provide additional verification that none >>> of the network packets are getting corrupted. >>> >>> Would driver domain work in your scenario? Then the disk could be >>> attached directly from your storage DomU without accruing all the >>> iscsi overhead. I'm not up with the status of HVM, vbd, and driver >>> domain so I don't know if this is possible. >>> >>> More questions than answers. Sorry :) >>> >>> James >> >> Dear James, >> >> thank you for your questions - I really appreciate everything that may >> help me move closer to solving or isolating the problem. >> >> I'll check what type of driver is used exactly - up until now I always >> just installed all drivers included in the package, I thought all of >> them were necessary. I'll try installing them without XenScsi. >> >> Do you mean revisions > 1092:85b99b9795a6 by "the latest versions"? >> Which version should I use? >> >> Forgive me if the descriptions were unclear. The initiator was always in >> dom0. I only moved the target to dom0 or a separate physical machine in >> (4) and (5). I didn't boot Windows directly from iSCSI (in fact I tried >> couple times, but had some problems with it, so I didn't mention it). >> >> My "disks" (the block devices I dedicated to the Windows VM) were whole >> 120GB and 240GB SSDs, ~100GB ZVOLs and 50GB LVM volumes. >> >> I'm using traditional linux bridge. I didn't set MTUs explicitly, so I >> assume it's 1500, but I will verify this. >> >> I'd love to use a storage driver domain, but the wiki says "It is not >> possible to use driver domains with pygrub or HVM guests yet". But the >> page is a couple of months old, maybe it's an outdated info? It surely >> is worth checking out. >> >> I'll do my best to provide answers to the remaining questions as soon as >> possible. Thank you for so many ideas. >> >> Best regards, >> Kuba >> >> _______________________________________________ >> Xen-users mailing list >> Xen-users@xxxxxxxxxxxxx >> http://lists.xen.org/xen-users > > It seems the problems are not related to GPLPV. There is an easy way to > reproduce the issues without Windows and without installing anything, > using only livecds for two DomUs: > > 1) Set up a Linux Dom0 with Xen 4.3.1 and standard Linux bridge for Dom0 > and DomUs Are you using a Xen build with debugging enabled? I think I might have a clue of what's happening, because I also saw it. Could you recompile Xen with debugging enabled and try the same test (iSCSI target on DomU and initiator on Dom0)? Roger. _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |