[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Introduce rtds real-time scheduler for Xen



Hi Dario,

I think I fixed the bug and could you please test it again. :-) I will
comment the error for the detailed explanation.

2014-09-17 10:15 GMT-04:00 Dario Faggioli <dario.faggioli@xxxxxxxxxx>:
> On dom, 2014-09-14 at 17:37 -0400, Meng Xu wrote:
>> This serie of patches adds rtds real-time scheduler to Xen.
>>
> I gave this series some testing, and the behavior of the scheduler is as
> expected, so again, Meng and Sisu, good work.
>
> While doing it, I've also put the series in this git repo/branch:
>
>   git://xenbits.xen.org/people/dariof/xen.git  sched/rt/rtds-v3
>   
> http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/sched/rt/rtds-v3
>
>
> There are a couple of issue, though, one minor and one serious, which
> I'd like you to fix, if possible, before freeze date. More info below.
>
>> //list VCPUs' parameters of each domain in cpu pools using rtds scheduler
>> #xl sched-rtds
>> Cpupool Pool-0: sched=EDF
>> Name                                ID    Period    Budget
>> Domain-0                             0     10000      4000
>> vm1                                  1     10000      4000
>>
> So, when I boot Xen with sched=rtds, issueing this command (`xl
> sched-rtds') produces a lot of printk on the serial console, basically
> outputting the dump of the scheduler information.
>
> I guess there is one call to rt_sched_dump() (or whatever that was) left
> somewhere. Could you please check?
>
> This is not a serious issue, but since you're resending anyway...
>
>> //create a cpupool test
>> #xl cpupool-cpu-remove Pool-0 3
>> #xl cpupool-cpu-remove Pool-0 2
>> #xl cpupool-create name=\"test\" sched=\"rtds\"
>> #xl cpupool-cpu-add test 3
>> #xl cpupool-cpu-add test 2
>> #xl cpupool-list
>> Name               CPUs   Sched     Active   Domain count
>> Pool-0               2     rtds       y          2
>> test                 2     rtds       y          0
>>
> This works for me too.
>
> Booting with sched=credit, creating an rtds cpupool and migruting
> domains there also works here too.
>
> However, booting with sched=rtds, and issuing the following commands:
> # xl cpupool-cpu-remove Pool-0 20
> # xl cpupool-cpu-remove Pool-0 21
> # xl cpupool-create /etc/xen/be-cpupool
>
> Where /etc/xen/be-cpupool looks like this:
>  name = "be"
>  sched = "credit"
>  cpus = ["20", "21"]
>  sched="credit"
>
> Makes Xen *CRASH* with the following trace:
>
> (XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    20
> (XEN) CPU:    21
> (XEN) RIP:    e008:[<ffff82d08012bb1e>]RIP:    e008:[<ffff82d08012bb1e>] 
> check_lock+0x1e/0x3b check_lock+0x1e/0x3b
> (XEN) RFLAGS: 0000000000010002
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> (XEN) CONTEXT: hypervisor
> (XEN) rax: 0000000000000001   rbx: 0000000000000000   rcx: 0000000000000001
> (XEN) rax: 0000000000000001   rbx: 0000000000000000   rcx: 0000000000000001
> (XEN) rdx: 0000000000000001   rsi: ffff830917fc4c80   rdi: 0000000000000004
> (XEN) rdx: 0000000000000001   rsi: ffff830917fc2c80   rdi: 0000000000000004
> (XEN) rbp: ffff830917fb7e08   rsp: ffff830917fb7e08   r8:  0000001d52034e80
> (XEN) rbp: ffff830917fafe08   rsp: ffff830917fafe08   r8:  0000000000000000
> (XEN) r9:  ffff830828a47978   r10: 00000000deadbeef   r11: 0000000000000246
> (XEN) r9:  ffff830917fe3ea8   r10: 00000000deadbeef   r11: 0000000000000246
> (XEN) r12: 0000001d51efbd89   r13: ffff82d080320de0   r14: 0000000000000000
> (XEN) r12: 0000001d51efbb99   r13: ffff82d080320de0   r14: 0000000000000000
> (XEN) r15: 0000000000000014   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) r15: 0000000000000015   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 00000000cf08f000   cr2: 0000000000000004
> (XEN) cr3: 00000000cf08f000   cr2: 0000000000000004
> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff830917fb7e08:
> (XEN)   Xen stack trace from rsp=ffff830917fafe08:
> (XEN)    ffff830917fb7e20 ffff830917fafe20 ffff82d08012bbc4 ffff82d08012bbc4 
> ffff8300cf12d000 ffff8300cf12c000 ffff830917fb7eb0 ffff830917fafeb0
> (XEN)
> (XEN)    ffff82d080128175 ffff82d080128175 ffff830917fb7e40 ffff830917fafe40 
> ffff82d0801879ef ffff82d0801879ef 0000001400fb7e60 0000001500fafe60
> (XEN)
> (XEN)    ffff830917fc4060 ffff830917fc2060 ffff830917fb7e60 ffff830917fafe60 
> ffff830917fc4200 ffff830917fc2200 ffff830917fb7eb0 ffff830917fafeb0
> (XEN)
> (XEN)    ffff82d08012e983 ffff82d08012e983 ffff830917fb7ef0 ffff830917fafef0 
> ffff82d0801aa941 ffff82d0801aa941 ffff830917fb7e90 ffff830917fafe90
> (XEN)
> (XEN)    ffff82d0802f8980 ffff82d0802f8a00 ffff82d0802f7f80 ffff82d0802f7f80 
> ffffffffffffffff ffffffffffffffff ffff830917fb0000 ffff830917fa8000
> (XEN)
> (XEN)    00000000000f4240 00000000000f4240 ffff830917fb7ee0 ffff830917fafee0 
> ffff82d08012b539 ffff82d08012b539 ffff830917fb0000 ffff830917fa8000
> (XEN)
> (XEN)    0000001d51dfd294 0000001d51dfcb8f ffff8300cf12d000 ffff8300cf12c000 
> ffff83092d6a0990 ffff83092d6a0990 ffff830917fb7ef0 ffff830917fafef0
> (XEN)
> (XEN)    ffff82d08012b591 ffff82d08012b591 ffff830917fb7f10 ffff830917faff10 
> ffff82d080160425 ffff82d080160425 ffff82d08012b591 ffff82d08012b591
> (XEN)
> (XEN)    ffff8300cf12d000 ffff8300cf12c000 ffff830917fb7e10 ffff830917fafe10 
> 0000000000000000 0000000000000000 ffff88003a0cbfd8 ffff88003a0edfd8
> (XEN)
> (XEN)    ffff88003a0cbfd8 ffff88003a0edfd8 0000000000000007 0000000000000014 
> ffff88003a0cbec0 ffff88003a0edec0 0000000000000000 0000000000000000
> (XEN)
> (XEN)    0000000000000246 0000000000000246 0000001c6462ff88 0000001c986bebc8 
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)
> (XEN)    0000000000000000 0000000000000000 ffffffff810013aa ffffffff810013aa 
> ffffffff81c31160 ffffffff81c31160 00000000deadbeef 00000000deadbeef
> (XEN)
> (XEN)    00000000deadbeef 00000000deadbeef 0000010000000000 0000010000000000 
> ffffffff810013aa ffffffff810013aa 000000000000e033 000000000000e033
> (XEN)
> (XEN)    0000000000000246 0000000000000246 ffff88003a0cbea8 ffff88003a0edea8 
> 000000000000e02b 000000000000e02b c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
> (XEN)
> (XEN)    c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 
> c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c200000014 c2c2c2c200000015
> (XEN)
> (XEN)    ffff8300cf12d000 ffff8300cf12c000 0000003897ca3280 0000003897ca1280 
> c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
> (XEN)
> (XEN) Xen call trace:
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08012bb1e>] check_lock+0x1e/0x3b
> (XEN)    [<ffff82d08012bb1e>] check_lock+0x1e/0x3b
> (XEN)    [<ffff82d08012bbc4>] _spin_lock_irq+0x1b/0x6c
> (XEN)    [<ffff82d08012bbc4>] _spin_lock_irq+0x1b/0x6c
> (XEN)    [<ffff82d080128175>] schedule+0xc0/0x5da
> (XEN)    [<ffff82d080128175>] schedule+0xc0/0x5da
> (XEN)    [<ffff82d08012b539>] __do_softirq+0x81/0x8c
> (XEN)    [<ffff82d08012b539>] __do_softirq+0x81/0x8c
> (XEN)    [<ffff82d08012b591>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d08012b591>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d080160425>] idle_loop+0x5e/0x6e
> (XEN)    [<ffff82d080160425>] idle_loop+0x5e/0x6e
> (XEN)
> (XEN)
> (XEN) Pagetable walk from 0000000000000004:
> (XEN) Pagetable walk from 0000000000000004:
> (XEN)  L4[0x000] = 000000092d6a4063 ffffffffffffffff
> (XEN)  L4[0x000] = 000000092d6a4063 ffffffffffffffff
> (XEN)  L3[0x000] = 000000092d6a3063 ffffffffffffffff
> (XEN)  L3[0x000] = 000000092d6a3063 ffffffffffffffff
> (XEN)  L2[0x000] = 000000092d6a2063 ffffffffffffffff
> (XEN)  L2[0x000] = 000000092d6a2063 ffffffffffffffff
> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 20:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000004
> (XEN) ****************************************
> (XEN)
> (XEN) Manual reset required ('noreboot' specified)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 21:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000004
> (XEN) ****************************************
> (XEN)
> (XEN) Manual reset required ('noreboot' specified)
>
> Can you please check, try to reproduce, and fix ASAP?

This is a lock/null pointer issue. When we remove a cpu from a
cpupool, rt_vcpu_remove() @xen/common/sched_rt.c will be called. In
that function, I just set the schedule_lock of schedule_data to NULL
in version 3' patch, which causes null pointer bug when schedule() @
xen/common/schedule.c tries to access the lock by "lock =
pcpu_schedule_lock_irq(cpu);".

I fixed this by changing the rt_vcpu_remove() @ sched_rt.c. The
changed rt_vcpu_remove() function is at LINE 436 at
https://github.com/PennPanda/xenproject/blob/rtxen-v1.90-patch-v4/xen/common/sched_rt.c

I also tested the scenario you gave as above on my 12cores machine. No bugs now

After booting up the system with rtds, I did:
# xl cpupool-cpu-remove Pool-0 11
# xl cpupool-cpu-remove Pool-0 10
# xl cpupool-create /etc/xen/be-cpupool

Where /etc/xen/be-cpupool looks like this:
    name = "be"
    sched = "credit"
    cpus = ["10", "11"]
    sched="credit"


Could you please pull the code and test on your machine to confirm it
works on your 24 cores machine as well?
The latest code is at https://github.com/PennPanda/xenproject ,
branch: rtxen-v1.90-patch-v4

(Of course, I can send a next version of patch with this fix, but I
hope to make sure this is working on your machine so that people won't
get too many patches. Then I "hope" the next patch set could get
committed. :-))

Once I receive the confirmation that the bug does not exist on your
machine any more, I will release the next version with all other
comments fixed.


>
> This, IMO, does not alter the prognosis, wrt 4.5 inclusion, at least not
> per-se. In fact, it's ok for some bugs to be around at feature freeze
> time, for the features we said we want. What we need to know is that
> we're likely going to be able to fix them without making the release
> slip.
>
> So you should really either fix this, or provide here enough insights,
> to convince people you're on the way to that. :-)

I think I figured out what happened, so the bug should have been solved. :-)

Thanks,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.