[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Question about Xen reboot on panic



Hi Andrew,

I thought I might find where the system got stuck.

As you suggested, I add several printks inside machine_restart();
If the machine restart when Xen kernel crashes, I can see the following output:

        umount: /run/lock: not mounted

        umount: /run/shm: not mounted

         * Will now restart

        [  122.261583] Restarting system.

        (XEN) Domain 0 shutdown: rebooting machine.

        (XEN) machine_restart start running
(This is what I added at the first line of the machine_restart())

        (XEN) machine_restart start running

        (XEN) reboot_type=97

        (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

So when the machine reboots correctly at Xen kernel crash,  the
machine_restart will be called twice.

After looking into the code, I found the following code in the
machine_restart(), which is quite suspicious.

    if ( system_state >= SYS_STATE_smp_boot )

    {

        local_irq_enable();


        /* Ensure we are the boot CPU. */

        if ( get_apic_id() != boot_cpu_physical_apicid )

        {

            /* Send IPI to the boot CPU (logical cpu 0). */

            on_selected_cpus(cpumask_of(0), __machine_restart,

                             &delay_millisecs, 0);

            for ( ; ; )

                halt();

        }


        smp_send_stop();

    }

This function basically try to send an IPI from the current CPU to
notify the boot CPU to run machine_restart() function and then the
current CPU goes to halt().

If the boot CPU missed the IPI, the machine_restart() will never be
called and the system hangs. Am I correct?

If I'm correct, how should I fix this? Should I just let the current
CPU keep sending the IPI to the boot CPU to run machine_restart()?
This seems too hacky to me, but I'm not quite sure why we have to use
the boot CPU to restart. If we can let any CPU to reset the CPU status
and reboot, we can avoid this.

or is it because the system_state is not correctly set? If we can
avoid getting into the if statement, we can also avoid this problem.

Do you have any suggestions?

Thank you very much for your help!

Best,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.