[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation



On Wed, 11 Sep 2013, Sander Eikelenboom wrote:
> Wednesday, September 11, 2013, 6:25:35 PM, you wrote:
> 
> 
> > Wednesday, September 11, 2013, 2:39:01 PM, you wrote:
> 
> >> On 11/09/13 13:08, Stefano Stabellini wrote:
> >>> On Wed, 11 Sep 2013, David Vrabel wrote:
> >>>> On 10/09/13 19:13, Sander Eikelenboom wrote:
> >>>>> Hi Wei,
> >>>>> 
> >>>>> Just back from holiday i tried running a new xen-unstable and
> >>>>> linux kernel (current Linus his tree + Konrad's last pull request
> >>>>> merged on top). I saw a thread and patch about a bug_on in
> >>>>> increase_reservation ... i'm seeing the splat below in dom0 when
> >>>>> guests get started.
> >>>> 
> >>>> Yes, the use of __per_cpu() in decrease_reservation is not safe.
> >>>> 
> >>>> Stefano, what testing did you give "xen/balloon: set a mapping for 
> >>>> ballooned out pages" (cd9151e2).  The number of critical problems
> >>>> it's had suggests not a lot?
> >>>> 
> >>>> I'm also becoming less happy with the inconsistency between the
> >>>> p2m updates between the different (non-)auto_translated_physmap
> >>>> guest types.
> >>>> 
> >>>> I think it (and all the attempts to fix it) should be reverted at
> >>>> this stage and we should try again for 3.13 which some more through
> >>>> testing and a more careful look at what updates to the p2m are
> >>>> needed.
> >>> 
> >>> Issues like this one are due to different kernel configurations /
> >>> usage patters. To reproduce this issue one needs a preemptible kernel
> >>> and blkback. I use a non-preemptible kernel and QEMU as block
> >>> backend.
> >>> 
> >>> Granted, in this case I should have tested blkback and both
> >>> preemptible and non-preemptible kernel configurations.  But in
> >>> general it is nearly impossible to test all the possible
> >>> configurations beforehand, it is a classic case of combinatorial
> >>> explosion.
> >>> 
> >>> These are exactly the kind of things that an exposure to a wider
> >>> range of users/developers help identify and fix.
> >>> 
> >>> So I think that we did the right thing here, by sending a pull
> >>> request early in the release cycle, so that now we have many other
> >>> RCs to fix all the issues that come up.
> 
> >> That sounds fair.
> 
> >> Sander, does the follow patch fix this issue?
> 
> > Hi David,
> 
> > This patch indeed seems to fix it.
> 
> Spoke too soon, starting a guest is fixed now, shutting it down oopses:
> 
> [  910.980798] vpn_bridge: port 1(vif13.0) entered disabled state
> [  910.988352] vpn_bridge: port 1(vif13.0) entered disabled state
> [  910.995427] device vif13.0 left promiscuous mode
> [  911.001821] vpn_bridge: port 1(vif13.0) entered disabled state
> [  911.364617] ------------[ cut here ]------------
> [  911.371022] kernel BUG at drivers/xen/balloon.c:365!

That's a different issue, it seems to be the same one found by Wei and
addressed by "xen/balloon: check whether a page is pointed to scratch
page MFN".

However the patch was never committed as the last update was missing.


Does the patch below solves the problem?


diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3101cf6..b52df76 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -349,8 +349,6 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
                BUG_ON(page == NULL);
 
                pfn = page_to_pfn(page);
-               BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
-                      phys_to_machine_mapping_valid(pfn));
 
                set_phys_to_machine(pfn, frame_list[i]);
 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.