[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT


  • To: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>, xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
  • Date: Thu, 26 Aug 2010 10:11:21 +0100
  • Cc:
  • Delivery-date: Thu, 26 Aug 2010 02:12:21 -0700
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: ActE/R2lhZGzvwI1QfSEvcfUNrsLQQAAYh9j
  • Thread-topic: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT

On 26/08/2010 09:59, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:

> Appreciate for the detail.
>  
> I notice the spin_lock for the code I referred, which as you mentioned will
> introduce a deadlock.
>  In fact, during the 48 hours long run, there was a VM hung, and from the xm
> list command, 
> the cpu time is quite high, to ten thousands, but the other VMS worked fine. I
> don't know whether
> it related to the potential deadlock, since Xen still worked.
>  
> So a quick question is if we replace the spin_lock with spin_lock_recursive,
> could we avoid this deadlock?

Yes. But we don't understand why this change to MMUEXT_PIN_xxx would fix
your observed bug, and without that understanding I wouldn't accept the
change into the tree.

> The if statement was executed during the test since I happend put the log and
> got the output log.

Tell us more. Like, for example, the domain id's of 'd' and 'pg_owner', and
whether they are PV or HVM domains.

> As a matter of fact,  HVMS(all windowns 2003) under my test all are have PV
> driver installed. I think that's why the patch take effects.

Nope. That hypercall is to do with PV pagetable management. An HVM guest
with PV drivers still has HVM pagetable management.

> Besides, I have been working on this issue for sometime, it is not possible I
> made a build mistake
> since I have been carefully all the time.
>  
> Anyway, I plan to kick off two reproduce on two physical servers, one has this
> patch enabled(use spin_lock_recursive
> instead of spin_lock) and the other with no change, completely on clean code.
> It would be useful if u have some
> trace to be added into the test. I will keep you informed.

Whether this fixes your problem is a good data point, but without full
understanding of the bug and why this is the correct and best fix, it will
not be accepted I'm afraid.

 -- Keir

> In addtion, my kernel is
> 2.6.31.13-pvops-patch #1 SMP Tue Aug 24 11:23:51 CST 2010 x86_64 x86_64 x86_64
> GNU/Linux
> Xen is 
> 4.0.0
>  
> Thanks.
>  
>  
> 
>  
>> Date: Thu, 26 Aug 2010 08:39:03 +0100
>> Subject: Re: [Xen-devel] Xen-unstable panic: FATAL PAGE FAULT
>> From: keir.fraser@xxxxxxxxxxxxx
>> To: tinnycloud@xxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx
>> 
>> On 26/08/2010 05:49, "MaoXiaoyun" <tinnycloud@xxxxxxxxxxx> wrote:
>> 
>>> Hi:
>>> 
>>> This issue can be easily reproduced by continuous and almost concurrently
>>> reboot 12 Xen HVM VMS on a single physic server. The reproduce hit the back
>>> trace about 6 to 14 hours after it started. I have several similar Xen back
>>> traces, please refer to the end of the mail. The first three back traces
>>> almost the same, they happened in domain_kill, while the last backtrace
>>> happened in do_multicall.
>>> 
>>> As go through the Xen code, in /xen-4.0.0/xen/arch/x86/mm.c, it shows
>>> that the author aware of the race competition between
>>> domain_relinquish_resources and presented code. It occurred me to simply
>>> move
>>> line 2765 and 2766 before 2764, that is move put_page_and_type(page) into
>>> the
>>> spin_lock to avoid competition.
>> 
>> Well, thanks for the detailed bug report: it is good to have a report that
>> includes an attempt at a fix!
>> 
>> In the below code, the put_page_and_type() is outside the locked region for
>> good reason. Put_page_and_type() -> put_page() -> free_domheap_pages() which
>> acquires d->page_alloc_lock. Because we do not use spin_lock_recursive() in
>> the below code, this recursive acquisition of the lock in
>> free_domheap_pages() would deadlock!
>> 
>> Now, I do not think this fix really affected your testing anyway, because
>> the below code is part of the MMUEXT_PIN_... hypercalls, and further is only
>> triggered when a domain executes one of those hypercalls on *another*
>> domain's memory. The *only* time that should happen is when dom0 builds a
>> *PV* VM. So since all your testing is on HVM guests I wouldn't expect the
>> code in the if() statement below to be executed ever. Well, maybe unless you
>> are using qemu stub domains, or pvgrub.
>> 
>> But even if the below code is being executed, I don't think your change is a
>> fix, or anything that should greatly affect the system apart from
>> introducing a deadlock. Is it instead possible that you somehow were testing
>> a broken build of Xen before, and simply re-building Xen with your change is
>> what fixed things? I wonder if the bug stays gone away if you revert your
>> change and re-build?
>> 
>> If it still appears that your fix is good, I would add tracing to the below
>> code and find out a bit more about when/why it is being executed.
>> 
>> -- Keir
>> 
>>> 2753 /* A page is dirtied when its pin status is set. */
>>> 2754 paging_mark_dirty(pg_owner, mfn);
>>> 2755 
>>> 2756 /* We can race domain destruction
>>> (domain_relinquish_resources). */
>>> 2757 if ( unlikely(pg_owner != d) )
>>> 2758 {
>>> 2759 int drop_ref;
>>> 2760 spin_lock(&pg_owner->page_alloc_lock);
>>> 2761 drop_ref = (pg_owner->is_dying &&
>>> 2762 test_and_clear_bit(_PGT_pinned,
>>> 2763 
>>> &page->u.inuse.type_info));
>>> 2764 spin_unlock(&pg_owner->page_alloc_lock);
>>> 2765 if ( drop_ref )
>>> 2766 put_page_and_type(page);
>>> 2767 }
>>> 2768 
>>> 2769 break;
>>> 2770 }
>>> 
>>> Form the result of reproduce on patched code, it appears the patch
>>> worked well since the reproduce succeed during a 48hours long run. But I am
>>> not sure of the side effects it brings.
>>> Appreciate in advance if someone could give more clauses, thx.
>>> 
>>> =============Trace 1: =============
>>> 
>>> (XEN) ----[ Xen-4.0.0 x86_64 debug=y Not tainted ]----
>>> (XEN) CPU: 0
>>> (XEN) RIP: e008:[<ffff82c48011617c>] free_heap_pages+0x55a/0x575
>>> (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor
>>> (XEN) rax: 0000001fffffffe0 rbx: ffff82f60b8bbfc0 rcx: ffff83063fe01a20
>>> (XEN) rdx: ffff8315ffffffe0 rsi: ffff8315ffffffe0 rdi: 00000000ffffffff
>>> (XEN) rbp: ffff82c48037fc98 rsp: ffff82c48037fc58 r8: 0000000000000000
>>> (XEN) r9: ffffffffffffffff r10: ffff82c48020e770 r11: 0000000000000282
>>> (XEN) r12: 00007d0a00000000 r13: 0000000000000000 r14: ffff82f60b8bbfe0
>>> (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0
>>> (XEN) cr3: 0000000232914000 cr2: ffff8315ffffffe4
>>> (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008
>>> (XEN) Xen stack trace from rsp=ffff82c48037fc58:
>>> (XEN) 0000000000000016 0000000000000000 00000000000001a2 ffff8304afc40000
>>> (XEN) 0000000000000000 ffff82f60b8bbfe0 00000000000330fe ffff82f60b8bc000
>>> (XEN) ffff82c48037fcd8 ffff82c48011647e 0000000100000000 ffff82f60b8bbfe0
>>> (XEN) ffff8304afc40020 0000000000000000 ffff8304afc40000 0000000000000000
>>> (XEN) ffff82c48037fcf8 ffff82c480160caf ffff8304afc40000 ffff82f60b8bbfe0
>>> (XEN) ffff82c48037fd68 ffff82c48014deaf 0000000000000ca3 ffff8304afc40fd8
>>> (XEN) ffff8304afc40fd8 ffff8304afc40fd8 4000000000000000 ffff82c48037ff28
>>> (XEN) 0000000000000000 ffff8304afc40000 ffff8304afc40000 000000000099e000
>>> (XEN) 00000000ffffffda 0000000000000001 ffff82c48037fd98 ffff82c4801504de
>>> (XEN) ffff8304afc40000 0000000000000000 000000000099e000 00000000ffffffda
>>> (XEN) ffff82c48037fdb8 ffff82c4801062ee 000000000099e000 fffffffffffffff3
>>> (XEN) ffff82c48037ff08 ffff82c480104cd7 ffff82c40000f800 0000000000000286
>>> (XEN) 0000000000000286 ffff8300bf76c000 000000ea864b1814 ffff8300bf76c030
>>> (XEN) ffff83023ff1ded8 ffff83023ff1ded0 ffff82c48037fe38 ffff82c48011c9f5
>>> (XEN) ffff82c48037ff08 ffff82c480272100 ffff8300bf76c000 ffff82c48037fe48
>>> (XEN) ffff82c48011f557 ffff82c480272100 0000000600000002 000000004700000a
>>> (XEN) 000000004700bf2c 0000000000000000 000000004700c158 0000000000000000
>>> (XEN) 00002b3b59e7d050 0000000000000000 0000007f00b14140 00002b3b5f257a80
>>> (XEN) 0000000000996380 00002aaaaaad0830 00002b3b5f257a80 00000000009bb690
>>> (XEN) 00002aaaaaad0830 000000398905abf3 000000000078de60 00002b3b5f257aa4
>>> (XEN) Xen call trace:
>>> (XEN) [<ffff82c48011617c>] free_heap_pages+0x55a/0x575
>>> (XEN) [<ffff82c48011647e>] free_domheap_pages+0x2e7/0x3ab
>>> (XEN) [<ffff82c480160caf>] put_page+0x69/0x70
>>> (XEN) [<ffff82c48014deaf>] relinquish_memory+0x36e/0x499
>>> (XEN) [<ffff82c4801504de>] domain_relinquish_resources+0x1ac/0x24c
>>> (XEN) [<ffff82c4801062ee>] domain_kill+0x93/0xe4
>>> (XEN) [<ffff82c480104cd7>] do_domctl+0xa1c/0x1205
>>> (XEN) [<ffff82c4801f71bf>] syscall_enter+0xef/0x149
>>> (XEN) 
>>> (XEN) Pagetable walk from ffff8315ffffffe4:
>>> (XEN) L4[0x106] = 00000000bf589027 5555555555555555
>>> (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff
>>> (XEN) 
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) FATAL PAGE FAULT
>>> (XEN) [error_code=0002]
>>> (XEN) Faulting linear address: ffff8315ffffffe4
>>> (XEN) ****************************************
>>> (XEN) 
>>> (XEN) Manual reset required ('noreboot' specified)
>>> 
>>> =============Trace 2: =============
>>> 
>>> (XEN) Xen call trace:
>>> (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0
>>> (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380
>>> (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500
>>> (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280
>>> (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0
>>> (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000
>>> (XEN) [<ffff82c48010739b>] evtchn_set_pending+0xab/0x1b0
>>> (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae
>>> (XEN) 
>>> (XEN) Pagetable walk from ffff8315ffffffe4:
>>> (XEN) L4[0x106] = 00000000bf569027 5555555555555555
>>> (XEN) L3[0x057] = 0000000000000000 ffffffffffffffff
>>> (XEN) stdvga.c:147:d60 entering stdvga and caching modes
>>> (XEN) 
>>> (XEN) ****************************************
>>> (XEN) HVM60: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp
>>> $
>>> (XEN) Panic on CPU 0:
>>> (XEN) FATAL PAGE FAULT
>>> (XEN) [error_code=0002]
>>> (XEN) Faulting linear address: ffff8315ffffffe4
>>> (XEN) ****************************************
>>> (XEN) 
>>> (XEN) Manual reset required ('noreboot' specified)
>>> 
>>> =============Trace 3: =============
>>> 
>>> 
>>> (XEN) Xen call trace:
>>> (XEN) [<ffff82c4801153c3>] free_heap_pages+0x283/0x4a0
>>> (XEN) [<ffff82c480115732>] free_domheap_pages+0x152/0x380
>>> (XEN) [<ffff82c48014aa89>] relinquish_memory+0x169/0x500
>>> (XEN) [<ffff82c48014b2cd>] domain_relinquish_resources+0x1ad/0x280
>>> (XEN) [<ffff82c480105fe0>] domain_kill+0x80/0xf0
>>> (XEN) [<ffff82c4801043ce>] do_domctl+0x1be/0x1000
>>> (XEN) [<ffff82c480117804>] csched_acct+0x384/0x430
>>> (XEN) [<ffff82c4801e3169>] syscall_enter+0xa9/0xae
>>> 
>> 
>> 
>        



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.