[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Question about Xen reboot on panic

I think the machine_restart() may have a bug. :-(

2015-11-12 11:13 GMT-05:00 Meng Xu <xumengpanda@xxxxxxxxx>:
> Hi Andrew,
> I thought I might find where the system got stuck.
> As you suggested, I add several printks inside machine_restart();
> If the machine restart when Xen kernel crashes, I can see the following 
> output:
>         umount: /run/lock: not mounted
>         umount: /run/shm: not mounted
>          * Will now restart
>         [  122.261583] Restarting system.
>         (XEN) Domain 0 shutdown: rebooting machine.
>         (XEN) machine_restart start running
> (This is what I added at the first line of the machine_restart())
>         (XEN) machine_restart start running
>         (XEN) reboot_type=97
>         (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
> So when the machine reboots correctly at Xen kernel crash,  the
> machine_restart will be called twice.
> After looking into the code, I found the following code in the
> machine_restart(), which is quite suspicious.
>     if ( system_state >= SYS_STATE_smp_boot )
>     {
>         local_irq_enable();
>         /* Ensure we are the boot CPU. */
>         if ( get_apic_id() != boot_cpu_physical_apicid )

If we are at the boot CPU and the if statement return true

>         {
>             /* Send IPI to the boot CPU (logical cpu 0). */
>             on_selected_cpus(cpumask_of(0), __machine_restart,
>                              &delay_millisecs, 0);

we will send an IPI from CPU 0 to CPU to run machine_restart.

>             for ( ; ; )
>                 halt();

and CPU 0 will halt immediately.

If the IPI arrives later on CPU 0, CPU 0 won't be able to handle it,
since it has been halted.

*** I have one solution in my mind ***
Maybe we should check if the current CPU is CPU 0 by using
smp_processor_id(); The only concern I have is I'm not sure if the
machine_restart() will be rescheduled by Xen scheduler onto another
CPU after we run the smp_processor_id();

*** Result below confirms my guess***
If I print out the current CPU who sends out the IPI and the following
result confirms my speculation:

XEN) Reboot in five seconds...

(XEN) now we should see: before kexec_crash

(XEN) before kexec_crash

(XEN) after kexec_crash

(XEN) machine_restart start running, delay_millisecs=5000

(XEN) machine_restart: finished console_start_sync, system_state is 3

(XEN) On P0
As this line suggests, P0 sends P0 an IPI and P0 goes to halt immediately...



Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.