[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest source



It happened again. Is it possible that Xen isn't disabling network
interrupts while it is "auditing all pages"?

                        -Kip


Killing domain 1
Releasing task 1
DOM0: INIT: Id "x1" respawning too fast: disabled for 5 minutes
DOM0: INIT: Id "x1" respawning too fast: disabled for 5 minutes
DOM0: INIT: Id "x1" respawning too fast: disabled for 5 minutes
audit_all_pages
refcount error: pfn=000247 cf=00000001 refcount=0
audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040

refcount error: pfn=00024d cf=00000001 refcount=0
audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040

refcount error: pfn=00036f cf=40000002 refcount=1
audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
  pte_pfn=002207 cf=10000023 tf=30000021 dom=fc648be0
    pte_idx=3f9 *pte_idx=0036f063

refcount error: pfn=000371 cf=40000002 refcount=1
audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
  pte_pfn=002207 cf=10000023 tf=30000021 dom=fc648be0
    pte_idx=3fe *pte_idx=00371063

refcount error: pfn=000372 cf=40000002 refcount=1
audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
  pte_pfn=002207 cf=10000023 tf=30000021 dom=fc648be0
    pte_idx=3fd *pte_idx=00372063

CPU:    1
EIP:    0808:[<fc532ddf>]
EFLAGS: 00010206
eax: 06081012   ebx: 00000010   ecx: fc657560   edx: fc650da0
esi: fc650da0   edi: fc657540   ebp: 00000000   esp: fc64fd70
ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
Stack trace from ESP=fc64fd70:
ff803012 00000008 [fc51302a] fc76bb40 0000003c fc657540 fc657540
[fc5328c2]
       fc657540 fc657400 00000017 ffffffff 00000017 fc648040 00000040
0000003e
       fc657400 fc76bb40 00000040 fc76a740 04000001 fc657540 00005048
[fc5323a8]
       fc657540 fc5ebc80 fc5d1f1c 00000001 00000046 00000004 [fc509639]
fc76a740
       04000001 fc64fe60 00000010 [fc5b1f2d] 00000010 fc657400 fc64fe60
3d6e6670
       33303030 63203237 00000001 fc76a740 fc600bc0 00000010 fc64fe60
[fc5b20f3]
       00000010 fc64fe60 fc76a740 00002207 00372063 34429e8f 00000001
0007fff0
       0007fff0 00000002 00000004 [fc5af970] 0007fff0 fd800000 00000001
0007fff0
       00000002 00000004 00000040 00010810 00000810 00000810 fc500810
ffffff10
       [fc50d0e5] 00000808 00000202 0000004d fc64ff6c fc64ff6c
[fc509f22] 0000004d
       00000000 fc64ff6c [fc5162b6] 00000003 00000040 fc5ebc80
[fc517c4e] 0000004d
       fc64ff6c fc64ff6c [fc512529] 00000003 fc6501e0 02000001
[fc517ca4] fc5ebc80
       fc64ff6c 0092578a 00000000 fc651200 00000006 00000006 [fc5b1f2d]
00000004
       fc5ebc80 fc64ff6c [fc5b151c] 00000004 00000001 00000001 fc6501e0
fc6008c0
       00000004 fc64ff6c [fc5b20f3] 00000004 fc64ff6c fc6501e0
[fc511e2e] fc624494
       431ea128 00000001 00000040 fc648040 00000040 fc649780 [fc5af970]
00000040
       00000001 00000040 fc648040 00000040 fc649780 00000040 fc640810
fc640810
       00000810 fc640810 ffffff04 [fc5b5e04] 00000808 00000246
[fc5b5e40] fc648040
       004c4b40 ffffffff 655f6464 7972746e 5f636100 656d6974 61007372
69745f63
       5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974
62007172

On Mon, 23 Feb 2004, Kip Macy wrote:

>
> >
> > This is a Xen crash dump. ksymoops won't help -- you'll need to map
> > the crash dump to Xen code by hand. It doesn't take long. The
> > addresses in the stack trace that are enclosed in square brackets are
> > likely to be return addresses in the function-call trace.
>
> This is sufficiently tedious that if this happens again I'm going to
> either run screaming or write a ksymoops for xen.
>
> >
> > 'objdump -d xen >xen.s'. Then you can search in xen.s with a text
> > editor to find the call-trace addresses.
>
> I did this and got what you see below. It looks like to backtraces
> interleaved. All of the values in brackets are legitimate return
> addresses (they immediately follow a call instruction). "function addr"
> is the address of the function itself and "ret addr" is the address
> taken from the oops.
>
> function              function addr   ret addr
> ================================================
> putchar                       fc5095be        fc5095ef
> e100_rx_srv           fc532048        fc53240a
> printf                        fc5095f7        fc509664
> putchar_serial                fc50927c        fc509299
> e100intr              fc531d8f        fc531ef0
> handle_IRQ_event      fc5b1a25        fc5b1a7d
> do_IRQ                        fc5b1bbb        fc5b1c43
> call_do_IRQ           fc5af4bb        fc5af4c0
> serial_rx_int         fc51801d        fc518078
> serial_rx_int         fc51801d        fc518046
> handle_IRQ_event      fc5b1a25        fc5b1a7d
> reprogram_ac_timer    fc5af087        fc5af0aa
> do_IRQ                        fc5b1bbb        fc5b1c43
> ac_timer_softirq_action       fc50455c        fc50465b
> call_do_IRQ           fc5af4bb        fc5af4c0
> default_idle          fc5b585c        fc5b582e
> continue_cpu_idle_loop        fc5b585f        fc5b5898
>
>
> The fault instruction is this:
> fc532927:     66 83 38 00             cmpw   $0x0,(%eax)
> It is in e100_start_ru. Obviously eax is pointing at some piece of
> unmapped memory. I'm not sufficiently versed in assembler, particularly
> optimized, to tell where in we are going wrong:
>
>
>       list_for_each(entry_ptr, &(bdp->active_rx_list)) {
>               rx_struct =
>                       list_entry(entry_ptr, struct rx_list_elem, list_elem);
>               pci_dma_sync_single(bdp->pdev, rx_struct->dma_addr,
>                                   bdp->rfd_size, PCI_DMA_FROMDEVICE);
>               if (!((SKB_RFD_STATUS(rx_struct->skb, bdp) &
>                      __constant_cpu_to_le16(RFD_STATUS_COMPLETE)))) {
>                       buffer_found = 1;
>                       break;
>               }
>       }
>
> Could the list have been corrupted?
>
>
>                               -Kip
>
>
> >
> >  -- Keir
> >
> > > After a few more minutes the following popped out on the console:
> > >
> > > CPU:    1
> > > EIP:    0808:[<fc532927>]
> > > EFLAGS: 00010206
> > > eax: 0a725012   ebx: 00000010   ecx: fc657560   edx: fc76a460
> > > esi: fc76a460   edi: fc657540   ebp: 00000000   esp: fc64fda0
> > > ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
> > > Stack trace from ESP=fc64fda0:
> > > ff865012 0000000a [fc5095ef] fc780140 0000003c fc657540 fc657540 
> > > [fc53240a]
> > >        fc657540 fc657400 [fc509664] 0000000a [fc509299] fc648040 00000040 
> > > 0000003e
> > >        fc657400 fc780140 00000040 fc76a740 04000001 fc657540 00005048 
> > > [fc531ef0]
> > >        fc657540 00000046 [fc509664] 0000000a 30303030 74203130 00000046 
> > > fc76a740
> > >        04000001 fc64fe90 00000010 [fc5b1a7d] 00000010 fc657400 fc64fe90 
> > > 3d6e6670
> > >        33303030 63203462 00000001 fc76a740 fc600200 00000010 fc64fe90 
> > > [fc5b1c43]
> > >        00000010 fc64fe90 fc76a740 0007fff0 000003b4 25c4fe2d 00000001 
> > > 0007fff0
> > >        0007fff0 00000000 00000000 [fc5af4c0] 0007fff0 fd800000 00000001 
> > > 0007fff0
> > >        00000000 00000000 00000040 00010810 00000810 00000810 fc500810 
> > > ffffff10
> > >        [fc50cff5] 00000808 00000202 fc654d4d 0000004d fc64ff6c [fc518078] 
> > > 0000004d
> > >        00000000 fc64ff6c [fc518046] 0036bfec 00000000 00000292 fc654d00 
> > > 02000001
> > >        fc64ff6c 00000004 [fc5b1a7d] 00000004 00000000 fc64ff6c [fc5af0aa] 
> > > fc650200
> > >        00000086 00000001 fc654d00 fc5fff00 00000004 fc64ff6c [fc5b1c43] 
> > > 00000004
> > >        fc64ff6c fc654d00 [fc50465b] 35c9c161 50d04d38 00000001 00000040 
> > > fc648040
> > >        00000040 fc7b8080 [fc5af4c0] 00000040 00000028 00000040 fc648040 
> > > 00000040
> > >        fc7b8080 00000040 fc640810 fc640810 00000810 fc7b0810 ffffff04 
> > > [fc5b585c]
> > >        00000808 00000246 [fc5b5898] fc648040 004c4b40 ffffffff 61007372 
> > > 69745f63
> > >        5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974 
> > > 62007172
> > >        636f6c72 00632e6b 736e6f63 2e656c6f 65640063 2e677562 65640063 
> > > fc648040
> > >
> > > ****************************************
> > > CPU1 FATAL PAGE FAULT
> > > [error_code=00000000]
> > > Faulting linear address might be 0a725012
> > > Aieee! CPU1 is toast...
> > > ****************************************
> > >
> > > Is this oops from Xen or from XenoLinux? I downloaded the latest
> > > ksymoops and did the following:
> > > kmacy@xentap ./ksymoops -v ../xenolinux-2.4.25/vmlinux -m 
> > > ../xenolinux-2.4.25/System.map < 
> > > ../xeno-unstable.bk.home/tools/xc/lib/crash1.txt
> > > ksymoops 2.4.9 on i686 2.4.25-xeno.  Options used
> > >      -v ../xenolinux-2.4.25/vmlinux (specified)
> > >      -k /proc/ksyms (default)
> > >      -l /proc/modules (default)
> > >      -o /lib/modules/2.4.25-xeno/ (default)
> > >      -m ../xenolinux-2.4.25/System.map (specified)
> > >
> > > No modules in ksyms, skipping objects
> > > Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid
> > > lsmod file?
> > > Warning (compare_maps): mismatch on symbol state d, System.map says
> > > c0175ca8, vmlinux says 0.  Ignoring System.map entry
> > > Warning (compare_maps): mismatch on symbol state a, vmlinux says 0,
> > > System.map says c0175ca8.  Ignoring System.map entry
> > > CPU:    1
> > > EIP:    0808:[<fc532927>]
> > > Using defaults from ksymoopsSegmentation fault
> > >
> > >
> > >                           -Kip
> > >
> > > On Mon, 23 Feb 2004, Kip Macy wrote:
> > >
> > > > I had just tested my domain builder for the nth time on xeno-unstable
> > > > (very latest source), when I saw the messages below on the console.
> > > > DOM0 no longer responds to ping - I'm hoping that it will recover,
> > > > however, in all likelihood I will be hitting the rpb in a few minutes.
> > > >
> > > > audit_all_pages
> > > > zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000
> > > > refcount error: pfn=000000 cf=fffffffd refcount=1
> > > > audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0
> > > >
> > > > refcount error: pfn=000247 cf=00000001 refcount=0
> > > > audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040
> > > >
> > > > refcount error: pfn=00024d cf=00000001 refcount=0
> > > > audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040
> > > >
> > > > refcount error: pfn=00036f cf=40000002 refcount=1
> > > > audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > > >     pte_idx=3f9 *pte_idx=0036f063
> > > >
> > > > refcount error: pfn=000371 cf=40000002 refcount=1
> > > > audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > > >     pte_idx=3fe *pte_idx=00371063
> > > >
> > > > refcount error: pfn=000372 cf=40000002 refcount=1
> > > > audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > > >     pte_idx=3fd *pte_idx=00372063
> > > >
> > > > refcount error: pfn=000390 cf=00000001 refcount=0
> > > > audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> > > >
> > > > refcount error: pfn=000392 cf=00000001 refcount=0
> > > > audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> > > >
> > > > refcount error: pfn=000393 cf=00000001 refcount=0
> > > > audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320
> > > >
> > > > refcount error: pfn=000395 cf=00000001 refcount=0
> > > > audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320
> > > >
> > > > refcount error: pfn=00039f cf=00000001 refcount=0
> > > > audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> > > >
> > > > refcount error: pfn=0003a1 cf=00000001 refcount=0
> > > > audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> > > >
> > > > refcount error: pfn=0003a2 cf=00000001 refcount=0
> > > > audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> > > >
> > > > refcount error: pfn=0003a8 cf=00000001 refcount=0
> > > > audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> > > >
> > > > refcount error: pfn=0003a9 cf=00000001 refcount=0
> > > > audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> > > >
> > > > refcount error: pfn=0003ab cf=00000001 refcount=0
> > > > audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> > > >
> > > > refcount error: pfn=0003ac cf=00000001 refcount=0
> > > > audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> > > >
> > > > refcount error: pfn=0003ae cf=00000001 refcount=0
> > > > audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> > > >
> > > > refcount error: pfn=0003af cf=00000001 refcount=0
> > > > audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340
> > > >
> > > > refcount error: pfn=0003b1 cf=00000001 refcount=0
> > > > audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340
> > > >
> > > > refcount error: pfn=0003b2 cf=00000001 refcount=0
> > > > audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> > > >
> > > > refcount error: pfn=0003b4 cf=00000001 refcount=0
> > > > audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > -------------------------------------------------------
> > > > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > > > Build and deploy apps & Web services for Linux with
> > > > a free DVD software kit from IBM. Click Now!
> > > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > > > _______________________________________________
> > > > Xen-devel mailing list
> > > > Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> > > > https://lists.sourceforge.net/lists/listinfo/xen-devel
> > > >
> > >
> > >
> > > -------------------------------------------------------
> > > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > > Build and deploy apps & Web services for Linux with
> > > a free DVD software kit from IBM. Click Now!
> > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> > > https://lists.sourceforge.net/lists/listinfo/xen-devel
> >
> >
> >
> > -------------------------------------------------------
> > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > Build and deploy apps & Web services for Linux with
> > a free DVD software kit from IBM. Click Now!
> > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> > https://lists.sourceforge.net/lists/listinfo/xen-devel
> >
>
>
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.