[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NULL scheduler DoS

On 09/08/2021 18:35, Julien Grall wrote:

On 09/08/2021 17:19, Ahmed, Daniele wrote:
Hi all,

Hi Daniele,

Thank you for the report!

The NULL scheduler is affected by an issue that triggers an assertion and reboots the hypervisor.

This issue arise when:

  * a guest is being created with a configuration specifying a file that
    does not exist
  * the hypervisor boots with the null scheduler

4.16 is affected and 4.15 also.

This is the stack trace from 4.16:

(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) ----[ Xen-4.16-unstable x86_64 debug=y Not tainted ]----
(XEN) CPU: 3
(XEN) RIP: e008:[<ffff82d04024f577>] common/sched/null.c#unit_deassign+0x1c3/0x2ec
(XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor
(XEN) rax: ffff83005ce1c850 rbx: 0000000000000001 rcx: 0000000000000001
(XEN) rdx: ffff83007fde6fc0 rsi: ffff83005ce1c790 rdi: ffff83007ffb7850
(XEN) rbp: ffff83007ffdfda0 rsp: ffff83007ffdfd48 r8: 0000000000000000
(XEN) r9: 0000000000048fee r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff82d0405c9298 r13: ffff83007f7fd508 r14: ffff83005ce1c850
(XEN) r15: ffff82d0405e2680 cr0: 000000008005003b cr4: 00000000003526e0
(XEN) cr3: 000000007f6b3000 cr2: ffff888072e79dc0
(XEN) fsb: 0000000000000000 gsb: ffff888071ac0000 gss: 0000000000000000
(XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Xen code around <ffff82d04024f577> (common/sched/null.c#unit_deassign+0x1c3/0x2ec): (XEN) 41 5e 41 5f 5d c3 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 49 8b 04 24 0f b7 00 66
(XEN) Xen stack trace from rsp=ffff83007ffdfd48:
(XEN) ffff83007ffdfd88 ffff82d04023961c 0000000400000000 ffff83005ce1cc50
(XEN) 0000000000000002 ffff83007ffdfd90 ffff83005ce1c790 ffff82d0405c9298
(XEN) ffff83007f7fd508 ffff83005ce1c850 ffff82d0405e2680 ffff83007ffdfde0
(XEN) ffff82d04024f889 ffff83007ffb7850 ffff83005dd63000 ffff83005ce1c790
(XEN) ffff83005845ab28 ffff83005845a000 0000000000000000 ffff83007ffdfe00
(XEN) ffff82d040253326 ffff83005dd63000 0000000000000000 ffff83007ffdfe38
(XEN) ffff82d04020506b ffff83007a881080 0000000000000000 0000000000000000
(XEN) 0000000000000000 ffff82d0405d6f80 ffff83007ffdfe70 ffff82d04022d9e5
(XEN) 0000001100000003 ffff82d0405cf100 ffff82d0405cf100 ffffffffffffffff
(XEN) ffff82d0405cef80 ffff83007ffdfea8 ffff82d04022e14b 0000000000000003
(XEN) ffff82d0405cf100 0000000000007fff 0000000000000003 0000000000000003
(XEN) ffff83007ffdfeb8 ffff82d04022e1e6 ffff83007ffdfef0 ffff82d0403172b4
(XEN) ffff82d04031721d ffff83007fec1000 ffff83007ffb6000 0000000000000003
(XEN) ffff83007ffcc000 ffff83007ffdfe18 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000003 0000000000000003
(XEN) 0000000000000246 0000000000000003 0000000000000000 000000001bf9dde5
(XEN) 0000000000000000 ffffffff810023aa 0000000000000003 deadbeefdeadf00d
(XEN) deadbeefdeadf00d 0000010000000000 ffffffff810023aa 000000000000e033
(XEN) 0000000000000246 ffffc900400a3ea8 000000000000e02b 7ffdff707fffd140
(XEN) 000000017fe37a6c 000000007ffe8010 0000000000000000 0000e01000000003
(XEN) Xen call trace:
(XEN) [<ffff82d04024f577>] R common/sched/null.c#unit_deassign+0x1c3/0x2ec (XEN) [<ffff82d04024f889>] F common/sched/null.c#null_unit_remove+0xfc/0x136
(XEN) [<ffff82d040253326>] F sched_destroy_vcpu+0xca/0x199
(XEN) [<ffff82d04020506b>] F common/domain.c#complete_domain_destroy+0x68/0x13f (XEN) [<ffff82d04022d9e5>] F common/rcupdate.c#rcu_process_callbacks+0xdb/0x24b
(XEN) [<ffff82d04022e14b>] F common/softirq.c#__do_softirq+0x8a/0xbc
(XEN) [<ffff82d04022e1e6>] F do_softirq+0x13/0x15
(XEN) [<ffff82d0403172b4>] F arch/x86/domain.c#idle_loop+0x97/0xee
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) ****************************************
(XEN) Reboot in five seconds...

This is the line of the assertion that triggers the reboot: https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377 <https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377>

To reproduce the vulnerability, I took the following steps:

Just to make clear for the others in the thread, per SUPPORT.MD, the NULL scheduler is not security supported. Hence why this is sent to xen-devel directly.

Also, for completeness, debug build are also not security supported. On production build, the ASSERT() would be turned to a NOP which could result to potentially more interesting issue. Anyway, that's not a problem here. :)

  * Install XEN; only 4.15+ seem to be vulnerable
  * Use the null scheduler (depends on your setup): edit
    /etc/default/grub adding at the end of the file:
    GRUB_CMDLINE_XEN="sched=null" and update grub
  * Reboot into xen
  * Create a file guest.cfg with the following contents


serial = [ 'file:/tmp/log', 'pty' ]

disk = [ '/home/user/boot.iso,,hdc,cdrom' ]

on_reboot = "destroy"


Make sure that the file //home/user/boot.iso/ does not exist

  * Create a guest with this configuration: xl create -c guest.cfg

CC’ing Dario, Stefano and Julien to whom I’ve shown this.

I am not quite too sure where the problem lies yet but adding some more information of the debugging we discussed together.

The ASSERT() is triggered because the pCPU was already assigned to one of the dom0 vCPU. This problem is happening regardless whether there is free pCPU.

I have added some debugging in sched_set_res():

diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index a870320146ef..2355f531dc13 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -150,6 +150,10 @@ static inline void sched_set_res(struct sched_unit *unit,
      unsigned int cpu = cpumask_first(res->cpus);
      struct vcpu *v;

+    printk("%s: res->master_cpu %u unit %p %pd %pv\n", __func__,
+           res->master_cpu, unit, unit->domain, unit->vcpu_list);
+    WARN();
      for_each_sched_unit_vcpu ( unit, v )
          ASSERT(cpu < nr_cpu_ids);

This traced the problem to null_unit_migrate():

(XEN) sched_set_res: res->master_cpu 0 unit ffff830200887f00 d1 d1v0
(XEN) Xen WARN at private.h:155
(XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Tainted:   C   ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82d04023fd9f>] core.c#sched_set_res+0x5b/0xc6
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor (d0v1)
(XEN) rax: ffff83027bf55038   rbx: 0000000000000000   rcx: 0000000000000000
(XEN) rdx: ffff83027bf4ffff   rsi: 000000000000000a   rdi: ffff82d0404944b8
(XEN) rbp: ffff83027bf4fc70   rsp: ffff83027bf4fc40   r8:  0000000000000004
(XEN) r9:  0000000000000030   r10: ffff83027bf4fcf8   r11: 00000000fffffffd
(XEN) r12: ffff830275e83000   r13: ffff830275e8d000   r14: ffff830200887f00
(XEN) r15: ffff83027bf850a0   cr0: 0000000080050033   cr4: 00000000003526e0
(XEN) cr3: 00000001f1e3d000   cr2: 0000563f71516088
(XEN) fsb: 00007f6561cda780   gsb: ffff88817fe80000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d04023fd9f> (core.c#sched_set_res+0x5b/0xc6):
(XEN)  14 18 00 e8 e7 7f 00 00 <0f> 0b 4d 8b 66 08 4d 85 e4 75 28 4d 89 7e 20 48
(XEN) Xen stack trace from rsp=ffff83027bf4fc40:
(XEN)    ffff83027bf85118 ffff830200887f00 ffff830275e83000 ffff830275e8d000 (XEN)    0000000000000000 ffff83027bf552a0 ffff83027bf4fce0 ffff82d040241614 (XEN)    ffff82d040226393 0000000000000286 ffff83027bf822e8 0000000175e8d000 (XEN)    ffff830200887f00 ffff83027bf552a0 ffff830275e83000 ffff83027bf4fcf8 (XEN)    ffff830275e8d000 ffff830275e8d000 0000000000000001 0000000000000000 (XEN)    ffff83027bf4fd40 ffff82d04020527d ffff82d04020527d 0000000000000000 (XEN)    0000000000000000 ffff83027bf4fd30 0000000000000000 0000000000000000 (XEN)    ffff830275e8d000 00007f65620f6010 0000000000000001 ffff82d040238319 (XEN)    ffff83027bf4fe58 ffff82d040238dd9 00000000001f1eae 0000000000000004 (XEN)    ffff83027bee4001 8000000000000000 ffff83027bf4fdc0 ffff82d04032e6df (XEN)    000000044032e6df 0000000000000000 ffff82e003e3e120 000000140000000f (XEN)    00007f6561d90001 0000559a00000001 0000000000000014 0000559ad9c303e0 (XEN)    0000000000000008 0000559ad9c303e0 0000559ad9c31170 0000559ad9c303c0 (XEN)    0000000000000000 00007ffd4ed54b60 0000559ad9c309a0 00007ffd4ed54c50 (XEN)    0000000000000000 0000559ad9c38240 0000559ad9c32570 00007ffd4ed54f00 (XEN)    0000559ad9c31170 ffff83027bf4fef8 0000000000000000 0000000000000001 (XEN)    deadbeefdeadf00d ffff83027bec0000 ffff82d040238319 ffff83027bf4fee8 (XEN)    ffff82d04030d8bc 00007f65620f6010 deadbeefdeadf00d deadbeefdeadf00d (XEN)    deadbeefdeadf00d deadbeefdeadf00d ffff82d04038821c ffff82d040388228 (XEN)    ffff82d04038821c ffff82d040388228 ffff82d04038821c ffff82d040388228
(XEN) Xen call trace:
(XEN)    [<ffff82d04023fd9f>] R core.c#sched_set_res+0x5b/0xc6
(XEN)    [<ffff82d040241614>] F sched_init_vcpu+0x3dc/0x5d7
(XEN)    [<ffff82d04020527d>] F vcpu_create+0xfb/0x37a
(XEN)    [<ffff82d040238dd9>] F do_domctl+0xac0/0x184a
(XEN)    [<ffff82d04030d8bc>] F pv_hypercall+0x10d/0x2b8
(XEN)    [<ffff82d04038829d>] F lstar_enter+0x12d/0x140

The end of the function contains an interesting comment:

      * Whatever all the above, we always at least override v->processor.
      * This is especially important for shutdown or suspend/resume paths,
      * when it is important to let our caller (cpu_disable_scheduler())
     * know that the migration did happen, to the best of our possibilities,
      * at least. In case of suspend, any temporary inconsistency caused
      * by this, will be fixed-up during resume.

This implies that a pCPU may temporarily be assigned to two vCPUs and we expect to be fixed up afterwards. However, a domain may be destroyed before this is happening.

So it looks like that unit_deassign() is not able to cope with this case. From a brief look, I think we may want to check if the pCPU is in the wait list. If it is, then we should bail out.

Actually, I was wrong. It looks like null_unit_remove() is already checking this condition. Also, the vCPU should be offline (and the unit as well) because they haven't come online yet:

     * If a unit is (going) offline, we want it to be neither assigned
     * to a pCPU, nor in the waitqueue.
     * If it was on a cpu, we've removed it from there above. If it is
     * in the waitqueue, we remove it from there now. And then we bail.
    if ( unlikely(!is_unit_online(unit)) )
        goto out;


    sched_set_res(unit, get_sched_res(new_cpu));

The jump to out looks suspicious because the comment on top says "We want it to be neither assigned to a pCPU, nor in the waitqueue". So maybe we should replace the goto out by a return?


Julien Grall



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.