[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [xen-unstable test] 1959: FAIL [and 1 more messages]



Hi Jeremy,

Sorry for the late response, recently I was spinning on some other tasks.

The updated patch is attached, which explicitly disable hrtimer when VM 
suspends. 

Thanks,
Dongxiao

________________________________________
From: Jeremy Fitzhardinge [jeremy@xxxxxxxx]
Sent: Tuesday, August 10, 2010 10:38 AM
To: Xu, Dongxiao
Cc: Ian Campbell; Ian Jackson; Xen-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] [xen-unstable test] 1959: FAIL [and 1 more messages]

  On 07/30/2010 02:18 AM, Ian Campbell wrote:
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index 328fe40..394bbc8 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -1340,6 +1340,10 @@ static enum hrtimer_restart 
>> smart_poll_function(struct hrtimer *timer)
>>      np = netdev_priv(dev);
>>
>>      spin_lock_irqsave(&np->tx_lock, flags);
>> +
>> +    if (!np->rx.sring)
>> +            goto end;
>> +
> Isn't there a period until the end of xennet_disconnect_backend is
> reached on resume where rx.sring will still point to the old shared
> ring? If so is it safe to drop through in that case?
>
> Would it be safer to add an explicit suspend handler which stopped the
> timer?

Dongxiao, do you have a comment/updated patch for this?  I'm going to
revert the smartpoll stuff in the meantime, because its causing tests to
fail.

Thanks,
     J

> Ian.
>
>>      np->smart_poll.counter++;xennet_disconnect_backend
>>
>>      if (likely(netif_carrier_ok(dev))) {
>> --
>> 1.6.3
>>
>>
>>
>> Jeremy Fitzhardinge wrote:
>>>    On 07/29/2010 09:33 AM, Ian Jackson wrote:
>>>> Jeremy Fitzhardinge writes ("Re: [Xen-devel] [xen-unstable test]
>>>> 1959: FAIL [and 1 more messages]"):
>>>>> On 07/29/2010 08:30 AM, Ian Jackson wrote:
>>>>>> Is save/restore supposed to work in pvops ?  (Using your kernel for
>>>>>> both dom0 and domU.)  That would seem to be the next thing to pick
>>>>>> off the list ...
>>>>> Yes.  IanC has been tracking down a bug where it fails after a few
>>>>> thousand iterations (now fixed?), but aside from that its
>>>>> apparently OK.
>>>> Well, I was able to reproduce the failure that the automatic test was
>>>> getting.  There's a problem with it not getting a copy of the console
>>>> output for some reason but I was able to get this oops from the
>>>> guest.
>>>>
>>>> Just after the restore I was able to connect to the PV console and it
>>>> echoed a few of my CRs (before saving I had logged in on the
>>>> console),
>>>> and then it produced the oops.  Now it's apparently completely
>>>> wedged.
>>>>
>>> Ah, it looks like the netfront smartpoll stuff isn't coping with
>>> save/restore.
>>>
>>>       J
>>>
>>>> Ian.
>>>>
>>>>
>>>>
>>>> [   63.681260] BUG: unable to handle kernel NULL pointer dereference
>>>> at 00000010 [   63.681293] IP: [<c1300381>]
>>>> smart_poll_function+0xbb/0xf2 [   63.681320] *pdpt =
>>>> 000000001fee5027 *pde = 0000000000000000 [   63.681344] Oops: 0000
>>>> [#1] SMP [   63.681362] last sysfs file: /sys/kernel/uevent_seqnum
>>>> [   63.681376] Modules linked in: [last unloaded: scsi_wait_scan] [
>>>> 63.681398] [   63.681410] Pid: 5, comm: events/0 Not tainted
>>>> (2.6.32.16 #1) [   63.681424] EIP: 0061:[<c1300381>] EFLAGS:
>>>> 00010002 CPU: 0 [   63.681438] EIP is at
>>>> smart_poll_function+0xbb/0xf2 [   63.681451] EAX: 00000000 EBX:
>>>> dfea8320 ECX: 00000001 EDX: 00000062 [   63.681465] ESI: 00000064
>>>> EDI: 00000000 EBP: df849cfc ESP: df849cdc [   63.681479]  DS: 007b
>>>> ES: 007b FS: 00d8 GS: 0000 SS: 0069 [   63.681493] Process events/0
>>>> (pid: 5, ti=df848000 task=df839480 task.ti=df848000) [   63.681508]
>>>> Stack: [   63.681516]  dfeac828 00000002 dfeac828 dfea8368 dfea0068
>>>> 0000002c dfeac828 c13002c6 [   63.681555]<0>   df849d1c c1079ab8
>>>> df849d48 c2386358 c2386328 0000002c 00000000 c2386328 [
>>>> 63.681598]<0>   df849d5c c1079ce8 000028cd 88bbcad9 d3b2cae7 0000000e
>>>> d3b2cae7 0000000e [   63.681646] Call Trace: [   63.681662]
>>>> [<c13002c6>] ? smart_poll_function+0x0/0xf2 [   63.681683]
>>>> [<c1079ab8>] ? __run_hrtimer+0xa9/0xf6 [   63.681701]  [<c1079ce8>]
>>>> ? hrtimer_interrupt+0xcd/0x1c8 [   63.681719]  [<c102cef0>] ?
>>>> xen_timer_interrupt+0x2b/0x224 [   63.681737]  [<c102cb3f>] ?
>>>> xen_force_evtchn_callback+0xf/0x14 [   63.681755]  [<c102d2ac>] ?
>>>> check_events+0x8/0xc [   63.681776]  [<c102d2a3>] ?
>>>> xen_restore_fl_direct_end+0x0/0x1 [   63.681795]  [<c14d1471>] ?
>>>> _spin_unlock_irqrestore+0x2f/0x31 [   63.681814]  [<c105eb9e>] ?
>>>> try_to_wake_up+0x2fa/0x304 [   63.681832]  [<c102cb3f>] ?
>>>> xen_force_evtchn_callback+0xf/0x14 [   63.681850]  [<c10a0cc4>] ?
>>>> handle_IRQ_event+0x5f/0x122 [   63.681867]  [<c10a233f>] ?
>>>> handle_level_irq+0x58/0xa9 [   63.681886]  [<c121983d>] ?
>>>> __xen_evtchn_do_upcall+0xab/0x131 [   63.681904]  [<c1219c71>] ?
>>>> xen_evtchn_do_upcall+0x20/0x30 [   63.682179]  [<c102ffe7>] ?
>>>> xen_do_upcall+0x7/0xc [   63.682179]  [<c102007b>] ?
>>>> apic_reg_write+0xa5/0x52f [   63.682179]  [<c1002227>] ?
>>>> hypercall_page+0x227/0x1005 [   63.682179]  [<c102cb3f>] ?
>>>> xen_force_evtchn_callback+0xf/0x14 [   63.682179]  [<c102d2ac>] ?
>>>> check_events+0x8/0xc [   63.682179]  [<c102d26b>] ?
>>>> xen_irq_enable_direct_end+0x0/0x1 [   63.682179]  [<c105cd0d>] ?
>>>> finish_task_switch+0x4f/0xa6 [   63.682179]  [<c14d0186>] ?
>>>> schedule+0x7dd/0x861 [   63.682179]  [<c106dd9d>] ?
>>>> __mod_timer+0x135/0x140 [   63.682179]  [<c102d2ac>] ?
>>>> check_events+0x8/0xc [   63.682179]  [<c102d2a3>] ?
>>>> xen_restore_fl_direct_end+0x0/0x1 [   63.682179]  [<c14d1471>] ?
>>>> _spin_unlock_irqrestore+0x2f/0x31 [   63.682179]  [<c107750b>] ?
>>>> prepare_to_wait+0x43/0x48 [   63.682179]  [<c10742a2>] ?
>>>> worker_thread+0x94/0x1d2 [   63.682179]  [<c10ccd2d>] ?
>>>> vmstat_update+0x0/0x2f [   63.682179]  [<c1077357>] ?
>>>> autoremove_wake_function+0x0/0x33 [   63.682179]  [<c107420e>] ?
>>>> worker_thread+0x0/0x1d2 [   63.682179]  [<c1077120>] ?
>>>> kthread+0x61/0x66 [   63.682179]  [<c10770bf>] ? kthread+0x0/0x66 [
>>>> 63.682179]  [<c102ff97>] ? kernel_thread_helper+0x7/0x10 [
>>>> 63.682179] Code: c6 89 d0 31 d2 f7 f6 85 d2 75 1a 85 c9 75 0c 8b 83
>>>> 90 08 00 00 c6 40 10 00 eb 32 c7 83 54 45 00 00 00 00 00 00 8b 83 90
>>>> 08 00 00<80>   78 10 00 74 1c 8b 4d e8 b8 00 ca 9a 3b 31 d2 f7 71 44
>>>> 31 c9 [   63.682179] EIP: [<c1300381>] smart_poll_function+0xbb/0xf2
>>>> SS:ESP 0069:df849cdc [   63.682179] CR2: 0000000000000010 [
>>>> 63.682179] ---[ end trace 760037e75e5675c8 ]--- [   63.682179]
>>>> Kernel panic - not syncing: Fatal exception in interrupt [
>>>> 63.682179] Pid: 5, comm: events/0 Tainted: G      D    2.6.32.16 #1
>>>> [   63.682179] Call Trace: [   63.682179]  [<c14cf7b1>] ?
>>>> printk+0xf/0x11 [   63.682179]  [<c14cf6ee>] panic+0x39/0xed [
>>>> 63.682179]  [<c14d2085>] oops_end+0xa1/0xb0 [   63.682179]
>>>> [<c104b426>] no_context+0x137/0x141 [   63.682179]  [<c104b56f>]
>>>> __bad_area_nosemaphore+0x13f/0x147 [   63.682179]  [<c104b584>]
>>>> bad_area_nosemaphore+0xd/0x10 [   63.682179]  [<c14d3275>]
>>>> do_page_fault+0x1c6/0x32b [   63.682179]  [<c14d30af>] ?
>>>> do_page_fault+0x0/0x32b [   63.682179]  [<c14d16c6>]
>>>> error_code+0x66/0x6c [   63.682179]  [<c14d30af>] ?
>>>> do_page_fault+0x0/0x32b [   63.682179]  [<c1300381>] ?
>>>> smart_poll_function+0xbb/0xf2 [   63.682179]  [<c13002c6>] ?
>>>> smart_poll_function+0x0/0xf2 [   63.682179]  [<c1079ab8>]
>>>> __run_hrtimer+0xa9/0xf6 [   63.682179]  [<c1079ce8>]
>>>> hrtimer_interrupt+0xcd/0x1c8 [   63.682179]  [<c102cef0>]
>>>> xen_timer_interrupt+0x2b/0x224 [   63.682179]  [<c102cb3f>] ?
>>>> xen_force_evtchn_callback+0xf/0x14 [   63.682179]  [<c102d2ac>] ?
>>>> check_events+0x8/0xc [   63.682179]  [<c102d2a3>] ?
>>>> xen_restore_fl_direct_end+0x0/0x1 [   63.682179]  [<c14d1471>] ?
>>>> _spin_unlock_irqrestore+0x2f/0x31 [   63.682179]  [<c105eb9e>] ?
>>>> try_to_wake_up+0x2fa/0x304 [   63.682179]  [<c102cb3f>] ?
>>>> xen_force_evtchn_callback+0xf/0x14 [   63.682179]  [<c10a0cc4>]
>>>> handle_IRQ_event+0x5f/0x122 [   63.682179]  [<c10a233f>]
>>>> handle_level_irq+0x58/0xa9 [   63.682179]  [<c121983d>]
>>>> __xen_evtchn_do_upcall+0xab/0x131 [   63.682179]  [<c1219c71>]
>>>> xen_evtchn_do_upcall+0x20/0x30 [   63.682179]  [<c102ffe7>]
>>>> xen_do_upcall+0x7/0xc [   63.682179]  [<c102007b>] ?
>>>> apic_reg_write+0xa5/0x52f [   63.682179]  [<c1002227>] ?
>>>> hypercall_page+0x227/0x1005 [   63.682179]  [<c102cb3f>] ?
>>>> xen_force_evtchn_callback+0xf/0x14 [   63.682179]  [<c102d2ac>]
>>>> check_events+0x8/0xc [   63.682179]  [<c102d26b>] ?
>>>> xen_irq_enable_direct_end+0x0/0x1 [   63.682179]  [<c105cd0d>] ?
>>>> finish_task_switch+0x4f/0xa6 [   63.682179]  [<c14d0186>]
>>>> schedule+0x7dd/0x861 [   63.682179]  [<c106dd9d>] ?
>>>> __mod_timer+0x135/0x140 [   63.682179]  [<c102d2ac>] ?
>>>> check_events+0x8/0xc [   63.682179]  [<c102d2a3>] ?
>>>> xen_restore_fl_direct_end+0x0/0x1 [   63.682179]  [<c14d1471>] ?
>>>> _spin_unlock_irqrestore+0x2f/0x31 [   63.682179]  [<c107750b>] ?
>>>> prepare_to_wait+0x43/0x48 [   63.682179]  [<c10742a2>]
>>>> worker_thread+0x94/0x1d2 [   63.682179]  [<c10ccd2d>] ?
>>>> vmstat_update+0x0/0x2f [   63.682179]  [<c1077357>] ?
>>>> autoremove_wake_function+0x0/0x33 [   63.682179]  [<c107420e>] ?
>>>> worker_thread+0x0/0x1d2 [   63.682179]  [<c1077120>]
>>>> kthread+0x61/0x66 [   63.682179]  [<c10770bf>] ? kthread+0x0/0x66 [
>>>> 63.682179]  [<c102ff97>] kernel_thread_helper+0x7/0x10
>

Attachment: 0001-Netfront-Fix-save-restore-after-enabled-smart-poll-f.patch
Description: 0001-Netfront-Fix-save-restore-after-enabled-smart-poll-f.patch

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.