[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Strange failures of Xen 4.3.1, PVHVM storage VM, iSCSI and Windows+GPLPV VM combination


  • To: <xen-users@xxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 5 Feb 2014 17:29:34 +0100
  • Delivery-date: Wed, 05 Feb 2014 16:29:54 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

On 05/02/14 17:13, Kuba wrote:
> W dniu 2014-02-01 20:27, Kuba pisze:
>> W dniu 2014-01-31 02:35, James Harper pisze:
>>>>
>>>> I am trying to set up a following configuration:
>>>> 1. very simple Linux-based dom0 (Debian 7.3) with Xen 4.3.1 compiled
>>>> from sources,
>>>> 2. one storage VM (FreeBSD 10, HVM+PV) with SATA controller attached
>>>> using VT-d, exporting block devices via iSCSI to other VMs and physical
>>>> machines,
>>>> 3. one Windows 7 SP1 64 VM (HVM+GPLPV) with GPU passthrough (Quadro
>>>> 4000) installed on a block device exported from the storage VM (target
>>>> on the storage VM, initiator on dom0).
>>>>
>>>> Everything works perfectly (including PCI & GPU passthrough) until I
>>>> install GPLPV drivers on the Windows VM. After driver installation,
>>>> Windows needs to reboot, boots fine, displays a message that PV SCSI
>>>
>>> (a)
>>>
>>>> drivers were installed and needs to reboot again, and then cannot boot.
>>>> Sometimes it gets stuck at "booting from harddrive" in SeaBIOS,
>>>> sometimes BSODs with "unmountable boot volume" message. All of the
>>>> following I tried without GPU passthrough to narrow down the problem.
>>>>
>>>> The intriguing part is this:
>>>>
>>>> 1. If the storage VM's OS is Linux - it fails with the above symptoms.
>>>> 2. If the block devices for the storage VM come directly from dom0 (not
>>>> via pci-passthrough) - it fails.
>>>> 2. If the storage VM is an HVM without PV drivers (e.g. FreeBSD
>>>> 9.2-GENERIC) - it all works.
>>>> 3. If the storage VM's OS is Linux with kernel compiled without Xen
>>>> guest support - it works, but is unstable (see below).
>>>> 4. If the iSCSI target is on a different physical machine - it all
>>>> works.
>>>> 5. If the iSCSI target is on dom0 itself - it works.
>>>> 6. If I attach the AHCI controller to the Windows VM and install
>>>> directly on the hard drive - it works.
>>>> 7. If the block device for Windows VM is a disk, partition, file, LVM
>>>> volume or even a ZoL's zvol (and it comes from a dom0 itself, without
>>>> iSCSI)- it works.
>>>>
>>>> If I install Windows and the GPLPV drivers on a hard drive attached to
>>>> dom0, Windows + GPLPV work perfectly. If I then give the same hard
>>>> drive
>>>> as a block device to the storage VM and re-export it through iSCSI,
>>>
>>> (b)
>>>
>>>> Windows usually boots fine, but works unstable. And by unstable I mean
>>>> random read/write errors, sometimes programs won't start, ntdll.dll
>>>> crashes, and after couple reboots Windows won't boot (just like
>>>> mentioned above).
>>>>
>>>> The configurations I would like to achieve makes sense only with PV
>>>> drivers on both storage and Windows VM. All of the "components" seem to
>>>> work perfectly until all put together, so I am not really sure where
>>>> the
>>>> problem is.
>>>>
>>>> I would be very grateful for any suggestions or ideas that could
>>>> possibly help to narrow down the problem. Maybe I am just doing
>>>> something wrong (I hope so). Or maybe there is a bug that shows itself
>>>> only in such a particular configuration (hope not)?
>>>>
>>>
>>> I'm curious about prompting for the pvscsi drivers to be installed. Is
>>> this definitely what it is asking for? Pvscsi for gplpv is removed in
>>> the latest versions and suffered varying degrees of bitrot in earlier
>>> versions. If you have the iscsi initiator in dom0 then exporting a
>>> block device to windows via the normal vbd channel should be just fine.
>>>
>>> You've gone to great lengths to explain the various things you've
>>> tried, but I think I'm a little confused on where the iscsi initiator
>>> is in the "doesn't work" scenarios. I'm having a bit of an off day
>>> today so it's probably just me, but above I have highlighted the two
>>> scenarios... could you fill me in on a few things:
>>>
>>> At (a) and (b), is the iscsi initiator in dom0, or are you actually
>>> booting windows directly via iscsi?
>>>
>>> At (b), with latest debug build of gplpv, can you run debugview from
>>> sysinternals.com and see if any interesting messages are displayed
>>> before things fall in a heap?
>>>
>>> Are any strange logs shown in any of Win DomU, Dom0, or storage DomU?
>>>
>>> How big are your disks?
>>>
>>> Can you reproduce with only one vcpu?
>>>
>>> What bridge are you using? Openvswitch or traditional linux bridge?
>>>
>>> What MTU are you using on your storage network? If you are using Jumbo
>>> frames can you go back to 1500 (or at least <= 4000)?
>>>
>>> Can you turn off scatter gather, Large Send Offload (GSO), and IP
>>> Checksum offload on all the iscsi endpoints?
>>>
>>> Can you turn on data digest/checksum on iscsi? If all endpoints
>>> support it then this would provide additional verification that none
>>> of the network packets are getting corrupted.
>>>
>>> Would driver domain work in your scenario? Then the disk could be
>>> attached directly from your storage DomU without accruing all the
>>> iscsi overhead. I'm not up with the status of HVM, vbd, and driver
>>> domain so I don't know if this is possible.
>>>
>>> More questions than answers. Sorry :)
>>>
>>> James
>>
>> Dear James,
>>
>> thank you for your questions - I really appreciate everything that may
>> help me move closer to solving or isolating the problem.
>>
>> I'll check what type of driver is used exactly - up until now I always
>> just installed all drivers included in the package, I thought all of
>> them were necessary. I'll try installing them without XenScsi.
>>
>> Do you mean revisions > 1092:85b99b9795a6 by "the latest versions"?
>> Which version should I use?
>>
>> Forgive me if the descriptions were unclear. The initiator was always in
>> dom0. I only moved the target to dom0 or a separate physical machine in
>> (4) and (5). I didn't boot Windows directly from iSCSI (in fact I tried
>> couple times, but had some problems with it, so I didn't mention it).
>>
>> My "disks" (the block devices I dedicated to the Windows VM) were whole
>> 120GB and 240GB SSDs, ~100GB ZVOLs and 50GB LVM volumes.
>>
>> I'm using traditional linux bridge. I didn't set MTUs explicitly, so I
>> assume it's 1500, but I will verify this.
>>
>> I'd love to use a storage driver domain, but the wiki says "It is not
>> possible to use driver domains with pygrub or HVM guests yet". But the
>> page is a couple of months old, maybe it's an outdated info? It surely
>> is worth checking out.
>>
>> I'll do my best to provide answers to the remaining questions as soon as
>> possible. Thank you for so many ideas.
>>
>> Best regards,
>> Kuba
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-users
> 
> It seems the problems are not related to GPLPV. There is an easy way to
> reproduce the issues without Windows and without installing anything,
> using only livecds for two DomUs:
> 
> 1) Set up a Linux Dom0 with Xen 4.3.1 and standard Linux bridge for Dom0
> and DomUs

Are you using a Xen build with debugging enabled? I think I might have a
clue of what's happening, because I also saw it. Could you recompile Xen
with debugging enabled and try the same test (iSCSI target on DomU and
initiator on Dom0)?

Roger.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.