[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [xen-unstable test] 97737: regressions - FAIL
On Mon, Jul 25, 2016 at 12:34:53PM +0100, Julien Grall wrote: > > > On 25/07/16 12:11, Wei Liu wrote: > >Thanks for investigating. > > > >There are only two arm related changes in the range being tested: > > > >* a43cc8f - (origin/smoke) arm/traps: fix bug in dump_guest_s1_walk handling > >of level 2 page tables (5 days ago) <Jonathan Daugherty> > >* 60e06f2 - arm/traps: fix bug in dump_guest_s1_walk L1 page table offset > >computation (5 days ago) <Jonathan Daugherty> > > > >They don't look very suspicious. > > The modified function is not called in the hypervisor at all. It's only here > for manual debugging. > > Although, this may change the offset of some function (assuming we have an > hidden bug). > > >If you need help navigating osstest test report, please let me know. > > I have noticed that there is 2 kernel BUG in the logs (with one host reboot > in the middle). Can you detail what the exact the test? What I normally do is to look at the summary page of the failed test to identify the failed step and the time. In this case: http://logs.test-lab.xenproject.org/osstest/logs/97737/test-armhf-armhf-xl/info.html The time stamp said the failed step started at 2016-07-21 19:30:10 Z, and then I look at the output of failed step log to look for time stamp that the test failed. Then I would look for output between these two time stamps in various log. I now realise the log I pasted in was not from the failed test. I wanted to paste in the second kernel oops, which should be the culprit that the test failed. The two oops was the same one, though. To identify which test step was running when the first oops happened, the same technique applies. It seems that the oops happened during ts-debian-install according to time stamps. Wei. > > It looks to me that you are trying to power cycle multiple time a guest. > > Cheers, > > >Wei. > > > > > >On Mon, Jul 25, 2016 at 12:05:08PM +0100, Julien Grall wrote: > >>Hi Wei, > >> > >>On 25/07/16 09:53, Wei Liu wrote: > >>>On Fri, Jul 22, 2016 at 03:27:30AM +0000, osstest service owner wrote: > >>>>flight 97737 xen-unstable real [real] > >>>>http://logs.test-lab.xenproject.org/osstest/logs/97737/ > >>>> > >>>>Regressions :-( > >>>> > >>>>Tests which did not succeed and are blocking, > >>>>including tests which could not be run: > >>>>test-armhf-armhf-xl 15 guest-start/debian.repeat fail REGR. vs. > >>>>97664 > >>> > >>>From > >>> > >>>\ > >>> > >>> > >>>Jul 21 17:08:59.405183 [ 4479.814529] ------------[ cut here ]------------ > >>> > >>>Jul 21 17:09:16.961529 [ 4479.814600] kernel BUG at > >>>drivers/xen/grant-table.c:923! > >>> > >>>Jul 21 17:09:16.966838 [ 4479.814628] Internal error: Oops - BUG: 0 [#1] > >>>SMP ARM > >>> > >>>Jul 21 17:09:16.972090 [ 4479.814656] Modules linked in: xen_gntalloc > >>>bridge stp ipv6 llc brcmfmac brcmutil cfg80211 > >>> > >>>Jul 21 17:09:16.980340 [ 4479.814759] CPU: 1 PID: 24761 Comm: > >>>vif5.0-q0-guest Not tainted 3.16.7-ckt12+ #1 > >>> > >>>Jul 21 17:09:16.987841 [ 4479.814795] task: d8ef7600 ti: d85bc000 task.ti: > >>>d85bc000 > >>> > >>>Jul 21 17:09:16.993339 [ 4479.814833] PC is at gnttab_batch_copy+0xd0/0xe4 > >>> > >>>Jul 21 17:09:16.997963 [ 4479.814860] LR is at gnttab_batch_copy+0x1c/0xe4 > >>> > >>>Jul 21 17:09:17.002718 [ 4479.814888] pc : [<c04bb190>] lr : > >>>[<c04bb0dc>] psr: a0070013 > >>> > >>>Jul 21 17:09:17.008962 [ 4479.814888] sp : d85bdea0 ip : deadbeef fp : > >>>c0c8e140 > >>> > >>>Jul 21 17:09:17.014341 [ 4479.814935] r10: 00000000 r9 : e1bec000 r8 : > >>>00000000 > >>> > >>>Jul 21 17:09:17.019595 [ 4479.814960] r7 : 00000002 r6 : 00000002 r5 : > >>>d85bdf20 r4 : e1bf4d30 > >>> > >>>Jul 21 17:09:17.026095 [ 4479.814990] r3 : 00000001 r2 : deadbeef r1 : > >>>deadbeef r0 : fffffff2 > >>> > >>>Jul 21 17:09:17.032717 [ 4479.815021] Flags: NzCv IRQs on FIQs on Mode > >>>SVC_32 ISA ARM Segment kernel > >>> > >>>Jul 21 17:09:17.040091 [ 4479.815055] Control: 10c5387d Table: 78d8406a > >>>DAC: 00000015 > >>> > >>>Jul 21 17:09:17.045964 [ 4479.815084] Process vif5.0-q0-guest (pid: 24761, > >>>stack limit = 0xd85bc248) > >>> > >>>Jul 21 17:09:17.052840 [ 4479.815114] Stack: (0xd85bdea0 to 0xd85be000) > >>> > >>>Jul 21 17:09:17.057218 [ 4479.815145] dea0: 00000001 d8b11388 d85bdf20 > >>>d85bdf04 00000002 c05eb054 00000388 00000000 > >>> > >>>Jul 21 17:09:17.065469 [ 4479.815183] dec0: d85bdf04 00000000 00000000 > >>>c0b7ea80 db0995c0 c05e86e4 e1bf4000 0000003c > >>> > >>>Jul 21 17:09:17.073753 [ 4479.815221] dee0: 00000000 00000000 00000000 > >>>c0b8849c e1bf4cfc c0c8e140 e1bf4d30 e1bf4cc4 > >>> > >>>Jul 21 17:09:17.082001 [ 4479.815260] df00: db0c3e80 00000000 d85bdf08 > >>>d85bdf08 d8c5cb40 d8c5cb40 00000001 00000000 > >>> > >>>Jul 21 17:09:17.090217 [ 4479.815298] df20: 00000002 00000000 00000001 > >>>00000000 e1bf4d30 e1c1f530 000004c6 0000023c > >>> > >>>Jul 21 17:09:17.098466 [ 4479.815337] df40: 00000000 00000000 d84aab80 > >>>e1bec000 c05ea990 00000000 00000000 00000000 > >>> > >>>Jul 21 17:09:17.106720 [ 4479.815375] df60: 00000000 c0266238 00000000 > >>>00000000 000000f8 e1bec000 00000000 00000000 > >>> > >>>Jul 21 17:09:17.114844 [ 4479.815414] df80: d85bdf80 d85bdf80 00000000 > >>>00000000 d85bdf90 d85bdf90 d85bdfac d84aab80 > >>> > >>>Jul 21 17:09:17.123093 [ 4479.815451] dfa0: c0266168 00000000 00000000 > >>>c020f138 00000000 00000000 00000000 00000000 > >>> > >>>Jul 21 17:09:17.131345 [ 4479.815489] dfc0: 00000000 00000000 00000000 > >>>00000000 00000000 00000000 00000000 00000000 > >>> > >>>Jul 21 17:09:17.139596 [ 4479.815527] dfe0: 00000000 00000000 00000000 > >>>00000000 00000013 00000000 00000000 00000000 > >>> > >>>Jul 21 17:09:17.147841 [ 4479.815583] [<c04bb190>] (gnttab_batch_copy) > >>>from [<c05eb054>] (xenvif_kthread_guest_rx+0x6c4/0xb58) > >> > >>From my understanding the hypercall can only return a non-zero value if > >>copy_*_guest helpers fails. > >> > >>Those helpers will only fail when it is not possible to retrieve the page > >>associated to a virtual address. The value is in r0 (-EFAULT), seem to > >>confirm that. So this looks very suspicious. > >> > >>Looking at the other parameters and the assembly code (see [1]): > >> count = 2 (saved in r6) > >> batch = 0xe1bf4d30 (saved in r4) > >> > >>They looks valid to me. Also, there was no major change around that code > >>recently. > >> > >>I don't have much ideas what is going on. And unfortunately Xen ARM does not > >>print much information when the translation fail. > >> > >>I have CCed few more people to see if they have a clue. > >> > >>> > >>>Jul 21 17:09:17.156969 [ 4479.815636] [<c05eb054>] > >>>(xenvif_kthread_guest_rx) from [<c0266238>] (kthread+0xd0/0xe8) > >>> > >>>Jul 21 17:09:17.165217 [ 4479.815681] [<c0266238>] (kthread) from > >>>[<c020f138>] (ret_from_fork+0x14/0x3c) > >>> > >>>Jul 21 17:09:17.172467 [ 4479.815721] Code: e1c432b4 eaffffe0 e7f001f2 > >>>e8bd80f8 (e7f001f2) > >>> > >>>Jul 21 17:09:17.178595 [ 4479.815766] ---[ end trace 6ba7d172d52e24e2 ]--- > >>> > >> > >>Regards, > >> > >>[1] > >>http://logs.test-lab.xenproject.org/osstest/logs/97737/build-armhf-pvops/info.html > >> > >>c04bb0c0 <gnttab_batch_copy>: > >>c04bb0c0: e92d40f8 push {r3, r4, r5, r6, r7, lr} > >>c04bb0c4: e1a02001 mov r2, r1 > >>c04bb0c8: e1a04000 mov r4, r0 > >>c04bb0cc: e1a06001 mov r6, r1 > >>c04bb0d0: e1a01000 mov r1, r0 > >>c04bb0d4: e3a00005 mov r0, #5 > >>c04bb0d8: ebf54e39 bl c020e9c4 <HYPERVISOR_grant_table_op> > >>c04bb0dc: e3500000 cmp r0, #0 > >>c04bb0e0: 1a00002a bne c04bb190 <gnttab_batch_copy+0xd0> > >> > >>-- > >>Julien Grall > > > > -- > Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |