[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: 2.6.37-rc1 mainline domU - BUG: unable to handle kernel paging request


  • To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Bruce Edge <bruce.edge@xxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>
  • From: Boris Derzhavets <bderzhavets@xxxxxxxxx>
  • Date: Fri, 19 Nov 2010 06:32:26 -0800 (PST)
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 19 Nov 2010 06:33:32 -0800
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=L91/hSfmUo/7mQgBt9v8Qk1x2nw8BPzQnnjUqNDLzSnI0fCN8YoioUaMUAwQ64O5nSA6dN+yZtN6B3kFHbvxGmW/Ysh00mWI2GhaNqY7daUt46pmLO0GZJoYgzxhIa0qyBBEjmj5atkMlX7G6QEfdfnc7+L3s2qZZkeRkyHSG7Q=;
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

I've also noticed , that if i change file say under /mnt/nfs/fedora

1. ls runs fine
2. `ls -l` - generates page fault

But doesn't crash DomU, regardless of stack trace printed in particular terminal session.

I can close crashed terminal and open second one.
# cd /mnt/nfs/fedora
# ls -l
will succeed until i make some new changes to file descriptors,eg. edit some file.
Then crashes second terminal session and third should be opened to be able
work with file descriptors ( `ls -l`).
When number of page faults reaches some critical value ( more then 5 , but in
general unpredictable) DomU crashes. So , i cannot any more open new terminal
session. This  is stable and obvious regression in 2.6.37-rc2 vs 2.6.36 as PV DomU
kernel.

Boris

--- On Thu, 11/18/10, Boris Derzhavets <bderzhavets@xxxxxxxxx> wrote:

From: Boris Derzhavets <bderzhavets@xxxxxxxxx>
Subject: Re: [Xen-devel] Re: 2.6.37-rc1 mainline domU - BUG: unable to handle kernel paging request
To: "Bruce Edge" <bruce.edge@xxxxxxxxx>
Cc: "Jeremy Fitzhardinge" <jeremy@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx>
Date: Thursday, November 18, 2010, 12:05 PM

Bruce,
You should be able to apply patches to mainline 2.6.37-rc2 cleanly. This patches are
taken out from MY's kernel-2.6.37-rc2.git0.fc15.src.rpm.
I already applied them on Ubuntu 10.10 to uncompressed mainline rc2.

I have also to notice, that
# mount IP-Dom0:/home/user1 /mnt/nfs
# cd /mnt/nfs
# ls -l
crashes DomU immediately in text mode.
In graphics mode it doesn't necessary happen every time.
DomU might survive this "hack" and crashed one
hour latter by another reason.

Boris.

--- On Thu, 11/18/10, Bruce Edge <bruce.edge@xxxxxxxxx> wrote:

From: Bruce Edge <bruce.edge@xxxxxxxxx>
Subject: Re: [Xen-devel] Re: 2.6.37-rc1 mainline domU - BUG: unable to handle kernel paging request
To: "Boris Derzhavets" <bderzhavets@xxxxxxxxx>
Cc: "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx>, "Jeremy Fitzhardinge" <jeremy@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Date: Thursday, November 18, 2010, 11:40 AM



On Thu, Nov 18, 2010 at 2:34 AM, Boris Derzhavets <bderzhavets@xxxxxxxxx> wrote:
Could you apply two attached patches on top of 2.6.37-rc2 and see
whether it gives some improvement or no ( with active NFS client at DomU)

Boris 

Hi Boris,

Are you using the mainline kernel or a pvops branch with these patches?
Maybe I'm doing something wrong, but they don't apply cleanly with 2.6.37-rc2:

%> patch --dry-run <../patches.2.6.37/xen.next-2.6.37.patch

patching file pgtable.h
Hunk #1 FAILED at 399.
1 out of 1 hunk FAILED -- saving rejects to file pgtable.h.rej
patching file pgtable.c
Hunk #1 FAILED at 15.
1 out of 1 hunk FAILED -- saving rejects to file pgtable.c.rej
patching file ttm_bo_vm.c
Hunk #1 FAILED at 273.
Hunk #2 FAILED at 288.
2 out of 2 hunks FAILED -- saving rejects to file ttm_bo_vm.c.rej
......

 %> patch --dry-run <../patches.2.6.37/xen.pcifront.fixes.patch

patching file enlighten.c
Hunk #1 FAILED at 1090.
Hunk #2 FAILED at 1202.
2 out of 2 hunks FAILED -- saving rejects to file enlighten.c.rej
patching file setup.c
Hunk #1 FAILED at 337.
Hunk #2 FAILED at 356.
2 out of 2 hunks FAILED -- saving rejects to file setup.c.rej

Same result for linux-2.6.37-xen-next branch.

-Bruce


--- On Wed, 11/17/10, Bruce Edge <bruce.edge@xxxxxxxxx> wrote:

From: Bruce Edge <bruce.edge@xxxxxxxxx>

Subject: Re: [Xen-devel] Re: 2.6.37-rc1 mainline domU - BUG: unable to handle kernel paging request
To: "Boris Derzhavets" <bderzhavets@xxxxxxxxx>
Cc: "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx>, "Jeremy Fitzhardinge" <jeremy@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx
Date: Wednesday, November 17, 2010, 4:28 PM


On Tue, Nov 16, 2010 at 1:49 PM, Boris Derzhavets <bderzhavets@xxxxxxxxx> wrote:
Yes, here we are

[  186.975228] ------------[ cut here ]------------
[  186.975245] kernel BUG at mm/mmap.c:2399!
[  186.975254] invalid opcode: 0000 [#1] SMP
[  186.975269] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
[  186.975284] CPU 0
[  186.975290] Modules linked in: nfs fscache deflate zlib_deflate ctr camellia cast5 rmd160 crypto_null ccm serpent blowfish twofish_generic twofish_x86_64 twofish_common ecb xcbc cbc sha256_generic sha512_generic des_generic cryptd aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 uinput xen_netfront microcode xen_blkfront [last unloaded: scsi_wait_scan]
[  186.975507]
[  186.975515] Pid: 1562, comm: ls Not tainted 2.6.37-0.1.rc1.git8.xendom0.fc14.x86_64 #1 /
[  186.975529] RIP: e030:[<ffffffff8110ada1>]  [<ffffffff8110ada1>] exit_mmap+0x10c/0x119
[  186.975550] RSP: e02b:ffff8800781bde18  EFLAGS: 00010202
[  186.975560] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  186.975573] RDX: 00000000914a9149 RSI: 0000000000000001 RDI: ffffea00000c0280
[  186.975585] RBP: ffff8800781bde48 R08: ffffea00000c0280 R09: 0000000000000001
[  186.975598] R10: ffffffff8100750f R11: ffffea0000967778 R12: ffff880076c68b00
[  186.975610] R13: ffff88007f83f1e0 R14: ffff880076c68b68 R15: 0000000000000001
[  186.975625] FS:  00007f8e471d97c0(0000) GS:ffff88007f831000(0000) knlGS:0000000000000000
[  186.975639] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  186.975650] CR2: 00007f8e464a9940 CR3: 0000000001a03000 CR4: 0000000000002660
[  186.975663] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  186.976012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  186.976012] Process ls (pid: 1562, threadinfo ffff8800781bc000, task ffff8800788223e0)
[  186.976012] Stack:
[  186.976012]  000000000000006b ffff88007f83f1e0 ffff8800781bde38 ffff880076c68b00
[  186.976012]  ffff880076c68c40 ffff8800788229d0 ffff8800781bde68 ffffffff810505fc
[  186.976012]  ffff8800788223e0 ffff880076c68b00 ffff8800781bdeb8 ffffffff81056747
[  186.976012] Call Trace:
[  186.976012]  [<ffffffff810505fc>] mmput+0x65/0xd8
[  186.976012]  [<ffffffff81056747>] exit_mm+0x13e/0x14b
[  186.976012]  [<ffffffff81056976>] do_exit+0x222/0x7c6
[  186.976012]  [<ffffffff8100750f>] ? xen_restore_fl_direct_end+0x0/0x1
[  186.976012]  [<ffffffff8107ea7c>] ? arch_local_irq_restore+0xb/0xd
[  186.976012]  [<ffffffff814b3949>] ? lockdep_sys_exit_thunk+0x35/0x67
[  186.976012]  [<ffffffff810571b0>] do_group_exit+0x88/0xb6
[  186.976012]  [<ffffffff810571f5>] sys_exit_group+0x17/0x1b
[  186.976012]  [<ffffffff8100acf2>] system_call_fastpath+0x16/0x1b
[  186.976012] Code: 8d 7d 18 e8 c3 8a 00 00 41 c7 45 08 00 00 00 00 48 89 df e8 0d e9 ff ff 48 85 c0 48 89 c3 75 f0 49 83 bc 24 98 01 00 00 00 74 02 <0f> 0b 48 83 c4 18 5b 41 5c 41 5d c9 c3 55 48 89 e5 41 54 53 48
[  186.976012] RIP  [<ffffffff8110ada1>] exit_mmap+0x10c/0x119
[  186.976012]  RSP <ffff8800781bde18>
[  186.976012] ---[ end trace c0f4eff4054a67e4 ]---
[  186.976012] Fixing recursive fault but reboot is needed!

Message from syslogd@fedora14 at Nov 17 00:47:40 ...
 kernel:[  186.975228] ------------[ cut here ]------------

Message from syslogd@fedora14 at Nov 17 00:47:40 ...
 kernel:[  186.975254] invalid opcode: 0000 [#1] SMP

Message from syslogd@fedora14 at Nov 17 00:47:40 ...
 kernel:[  186.975269] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map

Message from syslogd@fedora14 at Nov 17 00:47:40 ...
 kernel:[  186.976012] Stack:

Message from syslogd@fedora14 at Nov 17 00:47:40 ...
 kernel:[  186.976012] Call Trace:

Message from syslogd@fedora14 at Nov 17 00:47:40 ...
 kernel:[  186.976012] Code: 8d 7d 18 e8 c3 8a 00 00 41 c7 45 08 00 00 00 00 48 89 df e8 0d e9 ff ff 48 85 c0 48 89 c3 75 f0 49 83 bc 24 98 01 00 00 00 74 02 <0f> 0b 48 83 c4 18 5b 41 5c 41 5d c9 c3 55 48 89 e5 41 54 53 48

--- On Tue, 11/16/10, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] Re: 2.6.37-rc1 mainline domU - BUG: unable to handle kernel paging request
To: "Boris Derzhavets" <bderzhavets@xxxxxxxxx>
Cc: "Jeremy Fitzhardinge" <jeremy@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, "Bruce Edge" <bruce.edge@xxxxxxxxx>
Date: Tuesday, November 16, 2010, 4:15 PM

On Tue, Nov 16, 2010 at 12:43:28PM -0800, Boris Derzhavets wrote:
> > Huh. I .. what? I am confused. I thought we established that the issue
> > was not related to Xen PCI front? You also seem to uncomment the
> > upstream.core.patches and the xen.pvhvm.patch - why?
>
> I cannot uncomment upstream.core.patches and the xen.pvhvm.patch
> it gives failed HUNKs

Uhh.. I am even more confused.
>
> > Ok, they are.. v2.6.37-rc2 which came out today has the fixes
>
> I am pretty sure rc2 doesn't contain everything from xen.next-2.6.37.patch,
> gntdev's stuff for sure. I've built 2.6.37-rc2 kernel rpms and loaded
> kernel-2.6.27-rc2.git0.xendom0.x86_64 under Xen 4.0.1.
> Device /dev/xen/gntdev has not been created. I understand that it's
> unrelated to DomU ( related to Dom0) , but once again with rc2 in DomU i cannot
> get 3.2 GB copied over to DomU from NFS share at Dom0.

So what I think you are saying is that you keep on getting the bug in DomU?
Is the stack-trace the same as in rc1?



I haven't had much time to look into the broken/working version issues here, but I did confirm a couple of points:
1) The 2.6.37-rc2 has the same problem still
2) This problem goes away of one is not using NFS.

Not staggeringly helpful I know, but it's one small data point.

-Bruce




-----Inline Attachment Follows-----

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.