[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/3] xen: Have schedulers revise initial placement



On 11/08/16 15:59, Dario Faggioli wrote:
> On Fri, 2016-08-05 at 07:24 -0600, Jan Beulich wrote:
>> I'd really like to have those backported, but I have to ask one
>> of you to identify which prereq-s are needed on 4.6 and 4.5
>> (I'll revert them from 4.5 right away, but I'll wait for an osstest
>> flight to confirm the same issue exists on 4.6). 
>>
> Hey, I could only start working on this this morning (sorry for the
> delay), and I'll continue tomorrow but, at least here, staging-4.6
> (plus the patches!) crashes like this:
>
> (XEN) [    0.000000] ----[ Xen-4.6.4-pre  x86_64  debug=y  Not tainted ]----
> (XEN) [    0.000000] CPU:    0
> (XEN) [    0.000000] RIP:    e008:[<ffff82d0801238c4>] 
> _csched_cpu_pick+0x156/0x612
> (XEN) [    0.000000] RFLAGS: 0000000000010046   CONTEXT: hypervisor
> (XEN) [    0.000000] rax: 0000000000000040   rbx: 0000000000000040   rcx: 
> 0000000000000000
> (XEN) [    0.000000] rdx: 000000000000003f   rsi: 0000000000000040   rdi: 
> 0000000000000040
> (XEN) [    0.000000] rbp: ffff82d0802f7d68   rsp: ffff82d0802f7c78   r8:  
> 0000000000000000
> (XEN) [    0.000000] r9:  0000000000000001   r10: 0000000000000001   r11: 
> 00000000000000b4
> (XEN) [    0.000000] r12: 0000000000000040   r13: ffff83032152bf40   r14: 
> ffff82d08033ae40
> (XEN) [    0.000000] r15: ffff8300dbdf4000   cr0: 0000000080050033   cr4: 
> 00000000000006e0
> (XEN) [    0.000000] cr3: 00000000dba9f000   cr2: 0000000000000000
> (XEN) [    0.000000] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   
> cs: e008
> (XEN) [    0.000000] Xen stack trace from rsp=ffff82d0802f7c78:
> (XEN) [    0.000000]    0000000000000046 0000000200000092 ffff82d08033ae48 
> ffff82d08028b020
> (XEN) [    0.000000]    0000000000000000 ffff82d0802ff040 0000000100000001 
> ffff82d08033ae40
> (XEN) [    0.000000]    0000000000000097 ffff82d0802f7cd8 0000000000000006 
> ffff82d0802f7ce8
> (XEN) [    0.000000]    ffff82d08012d027 ffff830321532000 ffff82d0802f7d28 
> ffff82d08013c779
> (XEN) [    0.000000]    0000000000000000 0000000000000010 0000000000000048 
> 0000000000000048
> (XEN) [    0.000000]    0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> (XEN) [    0.000000]    ffff82d0802f7d78 ffff8300dbdf4000 ffff83032152a000 
> ffff83032152bf40
> (XEN) [    0.000000]    0000000000000000 ffff82d08028e020 ffff82d0802f7d78 
> ffff82d080123d8e
> (XEN) [    0.000000]    ffff82d0802f7db8 ffff82d080123db0 ffff83032152a000 
> ffff8300dbdf4000
> (XEN) [    0.000000]    ffff83032152a000 0000000000000000 0000000000000000 
> ffff82d08028e020
> (XEN) [    0.000000]    ffff82d0802f7de8 ffff82d080129e8d ffff82d0802f7de8 
> ffff82d08013cda0
> (XEN) [    0.000000]    ffff8300dbdf4000 ffff83032152a000 ffff82d0802f7e18 
> ffff82d080105f59
> (XEN) [    0.000000]    ffff82d0802a1720 ffff82d0802a1718 ffff82d0802a1720 
> ffff82d0802daa60
> (XEN) [    0.000000]    ffff82d0802f7e48 ffff82d0802a8145 0000000000000002 
> ffff83032152b610
> (XEN) [    0.000000]    0000000000000002 0000000000000001 ffff82d0802f7f08 
> ffff82d0802c5c6f
> (XEN) [    0.000000]    0000000000000000 0000000000100000 00000000014ae000 
> 0000000000324000
> (XEN) [    0.000000]    0080016300000000 0000000300000015 0000000000000000 
> 0000000000000010
> (XEN) [    0.000000]    ffff83000008dd50 ffff83000008df20 0000000000000002 
> ffff83000008dfb0
> (XEN) [    0.000000]    0000000800000000 000000010000006e 0000000000000003 
> 00000000000002f8
> (XEN) [    0.000000]    0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> (XEN) [    0.000000] Xen call trace:
> (XEN) [    0.000000]    [<ffff82d0801238c4>] _csched_cpu_pick+0x156/0x612
> (XEN) [    0.000000]    [<ffff82d080123d8e>] csched_cpu_pick+0xe/0x10
> (XEN) [    0.000000]    [<ffff82d080123db0>] csched_vcpu_insert+0x20/0x145
> (XEN) [    0.000000]    [<ffff82d080129e8d>] sched_init_vcpu+0x1d1/0x218
> (XEN) [    0.000000]    [<ffff82d080105f59>] alloc_vcpu+0x1ba/0x2a4
> (XEN) [    0.000000]    [<ffff82d0802a8145>] scheduler_init+0x1b0/0x271
> (XEN) [    0.000000]    [<ffff82d0802c5c6f>] __start_xen+0x1f85/0x2550
> (XEN) [    0.000000]    [<ffff82d080100073>] __high_start+0x53/0x55
> (XEN) [    0.000000] 
> (XEN) [    0.000000] 
> (XEN) [    0.000000] ****************************************
> (XEN) [    0.000000] Panic on CPU 0:
> (XEN) [    0.000000] Assertion 'cpu < nr_cpu_ids' failed at 
> ...e/SOURCES/xen/xen.git/xen/include/xen/cpumask.h:97
>
> Which, I think needs at least this hunk (from 6b53bb4ab3c9  "sched:
> better handle (not) inserting idle vCPUs in runqueues"):
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 2beebe8..fddcd52 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -240,20 +240,22 @@ int sched_init_vcpu(struct vcpu *v, unsigned int 
> processor)
>      init_timer(&v->poll_timer, poll_timer_fn,
>                 v, v->processor);
>  
> -    /* Idle VCPUs are scheduled immediately. */
> +    v->sched_priv = SCHED_OP(DOM2OP(d), alloc_vdata, v, d->sched_priv);
> +    if ( v->sched_priv == NULL )
> +        return 1;
> +
> +    TRACE_2D(TRC_SCHED_DOM_ADD, v->domain->domain_id, v->vcpu_id);
> +
> +    /* Idle VCPUs are scheduled immediately, so don't put them in runqueue. 
> */
>      if ( is_idle_domain(d) )
>      {
>          per_cpu(schedule_data, v->processor).curr = v;
>          v->is_running = 1;
>      }
> -
> -    TRACE_2D(TRC_SCHED_DOM_ADD, v->domain->domain_id, v->vcpu_id);
> -
> -    v->sched_priv = SCHED_OP(DOM2OP(d), alloc_vdata, v, d->sched_priv);
> -    if ( v->sched_priv == NULL )
> -        return 1;
> -
> -    SCHED_OP(DOM2OP(d), insert_vcpu, v);
> +    else
> +    {
> +        SCHED_OP(DOM2OP(d), insert_vcpu, v);
> +    }
>  
>      return 0;
>  }
>
> So, yeah, it's proving a little more complicated than how I thought it
> would have, just by looking at the patches. :-/
>
> Will let know.

FWIW, this looks very similar to the regression I just raised against
Xen 4.7 "[Xen-devel] Scheduler regression in 4.7".  The stack traces are
suspiciously similar.  I expect they have the same root cause.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.