[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Need help on a issue (Unable to schedule guest for Xen on Arm)





On 30/03/2023 16:57, Ayan Kumar Halder wrote:
(Apologies, fixed the formatting issue)

Hi,


On 30/03/2023 16:50, Ayan Kumar Halder wrote:
Hi Xen experts,

I need some pointers on an issue I am facing.

I am running my downstream port of Xen on Cortex-R52 hardware. The hardware consist of two R52 cores (the second core is in lockstep mode). So, currently the hardware does not support SMP.

The issue is that Xen is unable to schedule a guest.

Are you sure about this? Because...

So
leave_hypervisor_to_guest() ---> check_for_pcpu_work() and this does not return.

... leave_hypervisor_to_guest() indicates that a guest vCPU was scheduled. Before we return to the guest we always check if there is some softirq that need to be handled. So...


Debugging this, I see  check_for_pcpu_work() --> do_softirq() --> __do_softirq() --> timer_softirq_action().

.. the fact that check_for_pcpu_work() doesn't return seems to indicate that there is a softirq that is always pending. Can you look which one it is?


In timer_softirq_action(), the problem I see is that for all the timers, "((t = heap[1])->expires < now)" is true.

    while ( (heap_metadata(heap)->size != 0) &&
            ((t = heap[1])->expires < now) )
    {
        remove_from_heap(heap, t); <<<<------ So, this gets invoked for all the timers.
        execute_timer(ts, t);
    }

So, further below reprogram_timer() gets invoked with timeout = 0 and it disables the timer. So, timer_interrupt() is never invoked.

Which is expected because there is no timer armed, so there is no need to configure the physical timer.


Can someone give any pointers on what the underlying issue could be and how to debug further ?

See above. You could also look why there is no softtimer pending and/or where Xen is stuck (e.g. the PC).


I do not observe this behavior while running on R52 FVP. The difference is that for most of the timers "((t = heap[1])->expires < now)" is false, so reprogram_timer() gets invoked with non zero timeout and subsequently, the timer_interrupt() get invoked.
This reads as one of the following:
  1) There is a missing barrier
  2) You didn't properly configure some registers
3) There is an HW errata (I know that some of the Cortex-A had an issue in when reading the Timer counter but seems unlikely here)


Also, looking at the following from xen/arch/arm/time.c.

/* Set the timer to wake us up at a particular time.
 * Timeout is a Xen system time (nanoseconds since boot); 0 disables the timer.  * Returns 1 on success; 0 if the timeout is too soon or is in the past. */
int reprogram_timer(s_time_t timeout)
{
     uint64_t deadline;

     if ( timeout == 0 )
     {
         WRITE_SYSREG(0, CNTHP_CTL_EL2);
        return 1; <<<<<<<<<<<<<<<<<<<<<<-------------- Shouldn't this be 0 as the comment suggets ?

I think 1 is correct because we want to disable the timer so this is a success. 0 should be return if this was effectively a timeout.

FWIW, x86 also seems to return 1 when the timeout is 0.

     }

     deadline = ns_to_ticks(timeout) + boot_count;
     WRITE_SYSREG64(deadline, CNTHP_CVAL_EL2);
     WRITE_SYSREG(CNTx_CTL_ENABLE, CNTHP_CTL_EL2);
     isb();

     /* No need to check for timers in the past; the Generic Timer fires
      * on a signed 63-bit comparison. */
     return 1;
}

Kind regards,

Ayan





--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.