[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Strange failures of Xen 4.3.1, PVHVM storage VM, iSCSI and Windows+GPLPV VM combination



On 05/02/14 17:43, Kuba wrote:
> W dniu 2014-02-05 17:29, Roger Pau Monnà pisze:
>> On 05/02/14 17:13, Kuba wrote:
>>> W dniu 2014-02-01 20:27, Kuba pisze:
>>>> W dniu 2014-01-31 02:35, James Harper pisze:
>>>>>>
>>>>>> I am trying to set up a following configuration:
>>>>>> 1. very simple Linux-based dom0 (Debian 7.3) with Xen 4.3.1 compiled
>>>>>> from sources,
>>>>>> 2. one storage VM (FreeBSD 10, HVM+PV) with SATA controller attached
>>>>>> using VT-d, exporting block devices via iSCSI to other VMs and
>>>>>> physical
>>>>>> machines,
>>>>>> 3. one Windows 7 SP1 64 VM (HVM+GPLPV) with GPU passthrough (Quadro
>>>>>> 4000) installed on a block device exported from the storage VM
>>>>>> (target
>>>>>> on the storage VM, initiator on dom0).
>>>>>>
>>>>>> Everything works perfectly (including PCI & GPU passthrough) until I
>>>>>> install GPLPV drivers on the Windows VM. After driver installation,
>>>>>> Windows needs to reboot, boots fine, displays a message that PV SCSI
>>>>>
>>>>> (a)
>>>>>
>>>>>> drivers were installed and needs to reboot again, and then cannot
>>>>>> boot.
>>>>>> Sometimes it gets stuck at "booting from harddrive" in SeaBIOS,
>>>>>> sometimes BSODs with "unmountable boot volume" message. All of the
>>>>>> following I tried without GPU passthrough to narrow down the problem.
>>>>>>
>>>>>> The intriguing part is this:
>>>>>>
>>>>>> 1. If the storage VM's OS is Linux - it fails with the above
>>>>>> symptoms.
>>>>>> 2. If the block devices for the storage VM come directly from dom0
>>>>>> (not
>>>>>> via pci-passthrough) - it fails.
>>>>>> 2. If the storage VM is an HVM without PV drivers (e.g. FreeBSD
>>>>>> 9.2-GENERIC) - it all works.
>>>>>> 3. If the storage VM's OS is Linux with kernel compiled without Xen
>>>>>> guest support - it works, but is unstable (see below).
>>>>>> 4. If the iSCSI target is on a different physical machine - it all
>>>>>> works.
>>>>>> 5. If the iSCSI target is on dom0 itself - it works.
>>>>>> 6. If I attach the AHCI controller to the Windows VM and install
>>>>>> directly on the hard drive - it works.
>>>>>> 7. If the block device for Windows VM is a disk, partition, file, LVM
>>>>>> volume or even a ZoL's zvol (and it comes from a dom0 itself, without
>>>>>> iSCSI)- it works.
>>>>>>
>>>>>> If I install Windows and the GPLPV drivers on a hard drive
>>>>>> attached to
>>>>>> dom0, Windows + GPLPV work perfectly. If I then give the same hard
>>>>>> drive
>>>>>> as a block device to the storage VM and re-export it through iSCSI,
>>>>>
>>>>> (b)
>>>>>
>>>>>> Windows usually boots fine, but works unstable. And by unstable I
>>>>>> mean
>>>>>> random read/write errors, sometimes programs won't start, ntdll.dll
>>>>>> crashes, and after couple reboots Windows won't boot (just like
>>>>>> mentioned above).
>>>>>>
>>>>>> The configurations I would like to achieve makes sense only with PV
>>>>>> drivers on both storage and Windows VM. All of the "components"
>>>>>> seem to
>>>>>> work perfectly until all put together, so I am not really sure where
>>>>>> the
>>>>>> problem is.
>>>>>>
>>>>>> I would be very grateful for any suggestions or ideas that could
>>>>>> possibly help to narrow down the problem. Maybe I am just doing
>>>>>> something wrong (I hope so). Or maybe there is a bug that shows
>>>>>> itself
>>>>>> only in such a particular configuration (hope not)?
>>>>>>
>>>>>
>>>>> I'm curious about prompting for the pvscsi drivers to be installed. Is
>>>>> this definitely what it is asking for? Pvscsi for gplpv is removed in
>>>>> the latest versions and suffered varying degrees of bitrot in earlier
>>>>> versions. If you have the iscsi initiator in dom0 then exporting a
>>>>> block device to windows via the normal vbd channel should be just
>>>>> fine.
>>>>>
>>>>> You've gone to great lengths to explain the various things you've
>>>>> tried, but I think I'm a little confused on where the iscsi initiator
>>>>> is in the "doesn't work" scenarios. I'm having a bit of an off day
>>>>> today so it's probably just me, but above I have highlighted the two
>>>>> scenarios... could you fill me in on a few things:
>>>>>
>>>>> At (a) and (b), is the iscsi initiator in dom0, or are you actually
>>>>> booting windows directly via iscsi?
>>>>>
>>>>> At (b), with latest debug build of gplpv, can you run debugview from
>>>>> sysinternals.com and see if any interesting messages are displayed
>>>>> before things fall in a heap?
>>>>>
>>>>> Are any strange logs shown in any of Win DomU, Dom0, or storage DomU?
>>>>>
>>>>> How big are your disks?
>>>>>
>>>>> Can you reproduce with only one vcpu?
>>>>>
>>>>> What bridge are you using? Openvswitch or traditional linux bridge?
>>>>>
>>>>> What MTU are you using on your storage network? If you are using Jumbo
>>>>> frames can you go back to 1500 (or at least <= 4000)?
>>>>>
>>>>> Can you turn off scatter gather, Large Send Offload (GSO), and IP
>>>>> Checksum offload on all the iscsi endpoints?
>>>>>
>>>>> Can you turn on data digest/checksum on iscsi? If all endpoints
>>>>> support it then this would provide additional verification that none
>>>>> of the network packets are getting corrupted.
>>>>>
>>>>> Would driver domain work in your scenario? Then the disk could be
>>>>> attached directly from your storage DomU without accruing all the
>>>>> iscsi overhead. I'm not up with the status of HVM, vbd, and driver
>>>>> domain so I don't know if this is possible.
>>>>>
>>>>> More questions than answers. Sorry :)
>>>>>
>>>>> James
>>>>
>>>> Dear James,
>>>>
>>>> thank you for your questions - I really appreciate everything that may
>>>> help me move closer to solving or isolating the problem.
>>>>
>>>> I'll check what type of driver is used exactly - up until now I always
>>>> just installed all drivers included in the package, I thought all of
>>>> them were necessary. I'll try installing them without XenScsi.
>>>>
>>>> Do you mean revisions > 1092:85b99b9795a6 by "the latest versions"?
>>>> Which version should I use?
>>>>
>>>> Forgive me if the descriptions were unclear. The initiator was
>>>> always in
>>>> dom0. I only moved the target to dom0 or a separate physical machine in
>>>> (4) and (5). I didn't boot Windows directly from iSCSI (in fact I tried
>>>> couple times, but had some problems with it, so I didn't mention it).
>>>>
>>>> My "disks" (the block devices I dedicated to the Windows VM) were whole
>>>> 120GB and 240GB SSDs, ~100GB ZVOLs and 50GB LVM volumes.
>>>>
>>>> I'm using traditional linux bridge. I didn't set MTUs explicitly, so I
>>>> assume it's 1500, but I will verify this.
>>>>
>>>> I'd love to use a storage driver domain, but the wiki says "It is not
>>>> possible to use driver domains with pygrub or HVM guests yet". But the
>>>> page is a couple of months old, maybe it's an outdated info? It surely
>>>> is worth checking out.
>>>>
>>>> I'll do my best to provide answers to the remaining questions as
>>>> soon as
>>>> possible. Thank you for so many ideas.
>>>>
>>>> Best regards,
>>>> Kuba
>>>>
>>>> _______________________________________________
>>>> Xen-users mailing list
>>>> Xen-users@xxxxxxxxxxxxx
>>>> http://lists.xen.org/xen-users
>>>
>>> It seems the problems are not related to GPLPV. There is an easy way to
>>> reproduce the issues without Windows and without installing anything,
>>> using only livecds for two DomUs:
>>>
>>> 1) Set up a Linux Dom0 with Xen 4.3.1 and standard Linux bridge for Dom0
>>> and DomUs
>>
>> Are you using a Xen build with debugging enabled? I think I might have a
>> clue of what's happening, because I also saw it. Could you recompile Xen
>> with debugging enabled and try the same test (iSCSI target on DomU and
>> initiator on Dom0)?
>>
>> Roger.
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-users
>>
> 
> Of course I could! Please point me to any relevant information on how to
> build Xen with debugging enabled and what to do next. I build Xen using
> standard ./configure && make world && make install.

Just `make debug=y xen` and boot with the resulting xen.gz.

Roger.


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.