WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
Subject: Re: [Xen-devel] refcount errors then crash on XenoLinux with the latest source
From: Kip Macy <kmacy@xxxxxxxxxxx>
Date: Mon, 23 Feb 2004 17:11:13 -0800 (PST)
Cc: Ian Pratt <Ian.Pratt@xxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 24 Feb 2004 01:12:08 +0000
Envelope-to: steven.hand@xxxxxxxxxxxx
In-reply-to: <E1AvPc4-0001Z9-00@xxxxxxxxxxxxxxxxxxxx>
List-archive: <http://sourceforge.net/mailarchive/forum.php?forum=xen-devel>
List-help: <mailto:xen-devel-request@lists.sourceforge.net?subject=help>
List-id: List for Xen developers <xen-devel.lists.sourceforge.net>
List-post: <mailto:xen-devel@lists.sourceforge.net>
List-subscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=subscribe>
List-unsubscribe: <https://lists.sourceforge.net/lists/listinfo/xen-devel>, <mailto:xen-devel-request@lists.sourceforge.net?subject=unsubscribe>
References: <E1AvPc4-0001Z9-00@xxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-admin@xxxxxxxxxxxxxxxxxxxxx
>
> This is a Xen crash dump. ksymoops won't help -- you'll need to map
> the crash dump to Xen code by hand. It doesn't take long. The
> addresses in the stack trace that are enclosed in square brackets are
> likely to be return addresses in the function-call trace.

This is sufficiently tedious that if this happens again I'm going to
either run screaming or write a ksymoops for xen.

>
> 'objdump -d xen >xen.s'. Then you can search in xen.s with a text
> editor to find the call-trace addresses.

I did this and got what you see below. It looks like to backtraces
interleaved. All of the values in brackets are legitimate return
addresses (they immediately follow a call instruction). "function addr"
is the address of the function itself and "ret addr" is the address
taken from the oops.

function                function addr   ret addr
================================================
putchar                 fc5095be        fc5095ef
e100_rx_srv             fc532048        fc53240a
printf                  fc5095f7        fc509664
putchar_serial          fc50927c        fc509299
e100intr                fc531d8f        fc531ef0
handle_IRQ_event        fc5b1a25        fc5b1a7d
do_IRQ                  fc5b1bbb        fc5b1c43
call_do_IRQ             fc5af4bb        fc5af4c0
serial_rx_int           fc51801d        fc518078
serial_rx_int           fc51801d        fc518046
handle_IRQ_event        fc5b1a25        fc5b1a7d
reprogram_ac_timer      fc5af087        fc5af0aa
do_IRQ                  fc5b1bbb        fc5b1c43
ac_timer_softirq_action fc50455c        fc50465b
call_do_IRQ             fc5af4bb        fc5af4c0
default_idle            fc5b585c        fc5b582e
continue_cpu_idle_loop  fc5b585f        fc5b5898


The fault instruction is this:
fc532927:       66 83 38 00             cmpw   $0x0,(%eax)
It is in e100_start_ru. Obviously eax is pointing at some piece of
unmapped memory. I'm not sufficiently versed in assembler, particularly
optimized, to tell where in we are going wrong:


        list_for_each(entry_ptr, &(bdp->active_rx_list)) {
                rx_struct =
                        list_entry(entry_ptr, struct rx_list_elem, list_elem);
                pci_dma_sync_single(bdp->pdev, rx_struct->dma_addr,
                                    bdp->rfd_size, PCI_DMA_FROMDEVICE);
                if (!((SKB_RFD_STATUS(rx_struct->skb, bdp) &
                       __constant_cpu_to_le16(RFD_STATUS_COMPLETE)))) {
                        buffer_found = 1;
                        break;
                }
        }

Could the list have been corrupted?


                                -Kip


>
>  -- Keir
>
> > After a few more minutes the following popped out on the console:
> >
> > CPU:    1
> > EIP:    0808:[<fc532927>]
> > EFLAGS: 00010206
> > eax: 0a725012   ebx: 00000010   ecx: fc657560   edx: fc76a460
> > esi: fc76a460   edi: fc657540   ebp: 00000000   esp: fc64fda0
> > ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
> > Stack trace from ESP=fc64fda0:
> > ff865012 0000000a [fc5095ef] fc780140 0000003c fc657540 fc657540 [fc53240a]
> >        fc657540 fc657400 [fc509664] 0000000a [fc509299] fc648040 00000040 
> > 0000003e
> >        fc657400 fc780140 00000040 fc76a740 04000001 fc657540 00005048 
> > [fc531ef0]
> >        fc657540 00000046 [fc509664] 0000000a 30303030 74203130 00000046 
> > fc76a740
> >        04000001 fc64fe90 00000010 [fc5b1a7d] 00000010 fc657400 fc64fe90 
> > 3d6e6670
> >        33303030 63203462 00000001 fc76a740 fc600200 00000010 fc64fe90 
> > [fc5b1c43]
> >        00000010 fc64fe90 fc76a740 0007fff0 000003b4 25c4fe2d 00000001 
> > 0007fff0
> >        0007fff0 00000000 00000000 [fc5af4c0] 0007fff0 fd800000 00000001 
> > 0007fff0
> >        00000000 00000000 00000040 00010810 00000810 00000810 fc500810 
> > ffffff10
> >        [fc50cff5] 00000808 00000202 fc654d4d 0000004d fc64ff6c [fc518078] 
> > 0000004d
> >        00000000 fc64ff6c [fc518046] 0036bfec 00000000 00000292 fc654d00 
> > 02000001
> >        fc64ff6c 00000004 [fc5b1a7d] 00000004 00000000 fc64ff6c [fc5af0aa] 
> > fc650200
> >        00000086 00000001 fc654d00 fc5fff00 00000004 fc64ff6c [fc5b1c43] 
> > 00000004
> >        fc64ff6c fc654d00 [fc50465b] 35c9c161 50d04d38 00000001 00000040 
> > fc648040
> >        00000040 fc7b8080 [fc5af4c0] 00000040 00000028 00000040 fc648040 
> > 00000040
> >        fc7b8080 00000040 fc640810 fc640810 00000810 fc7b0810 ffffff04 
> > [fc5b585c]
> >        00000808 00000246 [fc5b5898] fc648040 004c4b40 ffffffff 61007372 
> > 69745f63
> >        5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974 
> > 62007172
> >        636f6c72 00632e6b 736e6f63 2e656c6f 65640063 2e677562 65640063 
> > fc648040
> >
> > ****************************************
> > CPU1 FATAL PAGE FAULT
> > [error_code=00000000]
> > Faulting linear address might be 0a725012
> > Aieee! CPU1 is toast...
> > ****************************************
> >
> > Is this oops from Xen or from XenoLinux? I downloaded the latest
> > ksymoops and did the following:
> > kmacy@xentap ./ksymoops -v ../xenolinux-2.4.25/vmlinux -m 
> > ../xenolinux-2.4.25/System.map < 
> > ../xeno-unstable.bk.home/tools/xc/lib/crash1.txt
> > ksymoops 2.4.9 on i686 2.4.25-xeno.  Options used
> >      -v ../xenolinux-2.4.25/vmlinux (specified)
> >      -k /proc/ksyms (default)
> >      -l /proc/modules (default)
> >      -o /lib/modules/2.4.25-xeno/ (default)
> >      -m ../xenolinux-2.4.25/System.map (specified)
> >
> > No modules in ksyms, skipping objects
> > Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid
> > lsmod file?
> > Warning (compare_maps): mismatch on symbol state d, System.map says
> > c0175ca8, vmlinux says 0.  Ignoring System.map entry
> > Warning (compare_maps): mismatch on symbol state a, vmlinux says 0,
> > System.map says c0175ca8.  Ignoring System.map entry
> > CPU:    1
> > EIP:    0808:[<fc532927>]
> > Using defaults from ksymoopsSegmentation fault
> >
> >
> >                             -Kip
> >
> > On Mon, 23 Feb 2004, Kip Macy wrote:
> >
> > > I had just tested my domain builder for the nth time on xeno-unstable
> > > (very latest source), when I saw the messages below on the console.
> > > DOM0 no longer responds to ping - I'm hoping that it will recover,
> > > however, in all likelihood I will be hitting the rpb in a few minutes.
> > >
> > > audit_all_pages
> > > zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000
> > > refcount error: pfn=000000 cf=fffffffd refcount=1
> > > audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0
> > >
> > > refcount error: pfn=000247 cf=00000001 refcount=0
> > > audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040
> > >
> > > refcount error: pfn=00024d cf=00000001 refcount=0
> > > audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040
> > >
> > > refcount error: pfn=00036f cf=40000002 refcount=1
> > > audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > >     pte_idx=3f9 *pte_idx=0036f063
> > >
> > > refcount error: pfn=000371 cf=40000002 refcount=1
> > > audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > >     pte_idx=3fe *pte_idx=00371063
> > >
> > > refcount error: pfn=000372 cf=40000002 refcount=1
> > > audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > >     pte_idx=3fd *pte_idx=00372063
> > >
> > > refcount error: pfn=000390 cf=00000001 refcount=0
> > > audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> > >
> > > refcount error: pfn=000392 cf=00000001 refcount=0
> > > audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> > >
> > > refcount error: pfn=000393 cf=00000001 refcount=0
> > > audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320
> > >
> > > refcount error: pfn=000395 cf=00000001 refcount=0
> > > audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320
> > >
> > > refcount error: pfn=00039f cf=00000001 refcount=0
> > > audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> > >
> > > refcount error: pfn=0003a1 cf=00000001 refcount=0
> > > audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> > >
> > > refcount error: pfn=0003a2 cf=00000001 refcount=0
> > > audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> > >
> > > refcount error: pfn=0003a8 cf=00000001 refcount=0
> > > audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> > >
> > > refcount error: pfn=0003a9 cf=00000001 refcount=0
> > > audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> > >
> > > refcount error: pfn=0003ab cf=00000001 refcount=0
> > > audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> > >
> > > refcount error: pfn=0003ac cf=00000001 refcount=0
> > > audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> > >
> > > refcount error: pfn=0003ae cf=00000001 refcount=0
> > > audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> > >
> > > refcount error: pfn=0003af cf=00000001 refcount=0
> > > audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340
> > >
> > > refcount error: pfn=0003b1 cf=00000001 refcount=0
> > > audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340
> > >
> > > refcount error: pfn=0003b2 cf=00000001 refcount=0
> > > audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> > >
> > > refcount error: pfn=0003b4 cf=00000001 refcount=0
> > > audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> > >
> > >
> > >
> > >
> > >
> > >
> > > -------------------------------------------------------
> > > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > > Build and deploy apps & Web services for Linux with
> > > a free DVD software kit from IBM. Click Now!
> > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> > > https://lists.sourceforge.net/lists/listinfo/xen-devel
> > >
> >
> >
> > -------------------------------------------------------
> > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > Build and deploy apps & Web services for Linux with
> > a free DVD software kit from IBM. Click Now!
> > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> > https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
>
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel