[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] System freeze with IGD passthrough



On Wed, Dec 19, 2012 at 2:20 PM, G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx> wrote:
> Adding Jean, the author to the opregion patch.
>
> Jean, I believe the warning is due to the offset within the page.
> To accommodate the offset, you would need to reserve another page for it.
> Will the extra page cause any unexpected problem?
>
> The original thread is about an instability issue that directly freeze the 
> host.
> I believe this warning above should not has such effect.
> What do you think? And any suggestion?
>

Jean appears to be no longer reach able.
The warning I found turns out to be not relevant.
According to the OpRegion spec, the tail part is reserved and should
never be touched by the guest.
But anyway, I had a local fix to get rid of the warning, but reserving
one more page and map it when the host opregion is not page aligned.
I'll send it to a separate thread.

Back to the topic. I updated to xen 4.2.1 and tried three times tonight.
Two of them lead to total freeze with no error log available, after
game playing for a couple of minutes.
And the last try ended up with GPU hang after 10+ minutes of game playing.
This is a guest only hang. But I still have no way to check GPU error
state even it has been collected:

[ 1553.588076] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
elapsed... GPU hung
[ 1553.592112] [drm] capturing error event; look for more information
in /debug/dri/0/i915_error_state
[ 1582.004075] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
elapsed... GPU hung
[ 1597.220075] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
elapsed... GPU hung
[ 1613.220074] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
elapsed... GPU hung

I'm wondering if the two syndromes are due to the same underlying cause.
But I guess a GPU hang caused by guest driver issue should not freeze
the host. Is it true?

I'm going to try more with different config -- different kernel
version, with / without PVOPS, native run vs VM etc.
But this is kind of blindly since I have no clue at all. If you have
anything to suspect, it will be highly appreciated.

Thanks,
Timothy

> Thanks,
> Timothy
>
> On Wed, Dec 19, 2012 at 1:28 AM, G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx> 
> wrote:
>> Hi Stefano,
>>
>> I recently tried to play some 3D games on my linux guest.
>> The game starts without problem but it freezes the entire system after
>> a some time (a minute or so?).
>> Here I mean both the host and domU are not responsive anymore.
>> The ssh freezes and i had to shutdown the machine using power button 
>> directly.
>>
>> I did not find anything obvious from the host log. But from the guest,
>> I can find this:
>>
>> Dec 18 20:28:38 debvm kernel: [    0.899860] resource map sanity check
>> conflict: 0xfeff5018 0xfeff7017 0xfeff7000 0xffffffff reserved
>> Dec 18 20:28:38 debvm kernel: [    0.899862] ------------[ cut here
>> ]------------
>> Dec 18 20:28:38 debvm kernel: [    0.899869] WARNING: at
>> arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2c4/0x33c()
>> Dec 18 20:28:38 debvm kernel: [    0.899870] Hardware name: HVM domU
>> Dec 18 20:28:38 debvm kernel: [    0.899872] Info: mapping multiple
>> BARs. Your kernel is fine.
>> Dec 18 20:28:38 debvm kernel: [    0.899873] Modules linked in:
>> Dec 18 20:28:38 debvm kernel: [    0.899878] Pid: 1, comm: swapper/0
>> Not tainted 3.6.9 #4
>> Dec 18 20:28:38 debvm kernel: [    0.899892] Call Trace:
>> Dec 18 20:28:38 debvm kernel: [    0.899896]  [<ffffffff8103d194>] ?
>> warn_slowpath_common+0x76/0x8a
>> Dec 18 20:28:38 debvm kernel: [    0.899898]  [<ffffffff8103d240>] ?
>> warn_slowpath_fmt+0x45/0x4a
>> Dec 18 20:28:38 debvm kernel: [    0.899900]  [<ffffffff81032a6c>] ?
>> __ioremap_caller+0x2c4/0x33c
>> Dec 18 20:28:38 debvm kernel: [    0.899902]  [<ffffffff812c3be3>] ?
>> intel_opregion_setup+0x9c/0x201
>> Dec 18 20:28:38 debvm kernel: [    0.899904]  [<ffffffff812bcb75>] ?
>> intel_setup_gmbus+0x175/0x19d
>> Dec 18 20:28:38 debvm kernel: [    0.899907]  [<ffffffff8128a37a>] ?
>> i915_driver_load+0x548/0x90d
>> Dec 18 20:28:38 debvm kernel: [    0.899910]  [<ffffffff812ff804>] ?
>> setup_hpet_msi_remapped+0x20/0x20
>> Dec 18 20:28:38 debvm kernel: [    0.899912]  [<ffffffff81272706>] ?
>> drm_get_pci_dev+0x152/0x259
>> Dec 18 20:28:38 debvm kernel: [    0.899915]  [<ffffffff813d4883>] ?
>> _raw_spin_lock_irqsave+0x21/0x45
>> Dec 18 20:28:38 debvm kernel: [    0.899918]  [<ffffffff811d9ecc>] ?
>> local_pci_probe+0x5a/0xa0
>> Dec 18 20:28:38 debvm kernel: [    0.899920]  [<ffffffff811d9fcf>] ?
>> pci_device_probe+0xbd/0xe7
>> Dec 18 20:28:38 debvm kernel: [    0.899922]  [<ffffffff812cd887>] ?
>> driver_probe_device+0x1b0/0x1b0
>> Dec 18 20:28:38 debvm kernel: [    0.899923]  [<ffffffff812cd887>] ?
>> driver_probe_device+0x1b0/0x1b0
>> Dec 18 20:28:38 debvm kernel: [    0.899925]  [<ffffffff812cd769>] ?
>> driver_probe_device+0x92/0x1b0
>> Dec 18 20:28:38 debvm kernel: [    0.899926]  [<ffffffff812cd8da>] ?
>> __driver_attach+0x53/0x73
>> Dec 18 20:28:38 debvm kernel: [    0.899928]  [<ffffffff812cc06f>] ?
>> bus_for_each_dev+0x46/0x77
>> Dec 18 20:28:38 debvm kernel: [    0.899930]  [<ffffffff812ccf8f>] ?
>> bus_add_driver+0xd5/0x1f4
>> Dec 18 20:28:38 debvm kernel: [    0.899931]  [<ffffffff812cde14>] ?
>> driver_register+0x89/0x101
>> Dec 18 20:28:38 debvm kernel: [    0.899933]  [<ffffffff811d9336>] ?
>> __pci_register_driver+0x49/0xa3
>> Dec 18 20:28:38 debvm kernel: [    0.899935]  [<ffffffff816d55c7>] ?
>> ttm_init+0x63/0x63
>> Dec 18 20:28:38 debvm kernel: [    0.899937]  [<ffffffff81002085>] ?
>> do_one_initcall+0x75/0x12c
>> Dec 18 20:28:38 debvm kernel: [    0.899940]  [<ffffffff816a6cc2>] ?
>> kernel_init+0x13c/0x1c0
>> Dec 18 20:28:38 debvm kernel: [    0.899941]  [<ffffffff816a6565>] ?
>> do_early_param+0x83/0x83
>> Dec 18 20:28:38 debvm kernel: [    0.899943]  [<ffffffff813d9f44>] ?
>> kernel_thread_helper+0x4/0x10
>> Dec 18 20:28:38 debvm kernel: [    0.899945]  [<ffffffff816a6b86>] ?
>> start_kernel+0x3e1/0x3e1
>> Dec 18 20:28:38 debvm kernel: [    0.899947]  [<ffffffff813d9f40>] ?
>> gs_change+0x13/0x13
>> Dec 18 20:28:38 debvm kernel: [    0.899950] ---[ end trace
>> db461543ce599b44 ]---
>>
>> I'm not sure if this has anything to do with the freeze. This seems to
>> show up on every boot after I upgraded to xen version 4.2.1-rc2. Both
>> debian kernel 3.2.32 / 3.6.9 suffers from the same log. But whole
>> system freeze happens only during gaming, which is much less frequent.
>> So I'm not sure if the two are related. But anyway, could you comment
>> about what does this log mean?
>>
>> I can find the one of the mentioned address in the qemu_dm log:
>> pt_pci_write_config: [00:02:0] address=00fc val=0xfeff5000 len=4
>> igd_write_opregion: Map OpRegion: cd996018 -> feff5018
>> igd_write_opregion: [00:02:0] addr=fc len=2 val=feff5000
>>
>> PS: I also run xbmc on domU and it playbacks video under HW
>> acceleration (VAAPI) without any problem. XBMC by itself is also an
>> graphics intensive program. But this runs on an pure HVM guest, while
>> the failing case is on PVHVM.
>>
>> PS2: I also suffered another instability yesterday. It happens when I
>> was compiling kernel in side the domU. The host reboots suddenly.
>> Since I'm not using graphics at that time (Xorg session is idle, I
>> connected through SSH), this may be a different issue.
>>
>> Thanks,
>> Timothy

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.