[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Need help on a issue (Unable to schedule guest for Xen on Arm)


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx, Julien Grall <julien@xxxxxxx>
  • From: Ayan Kumar Halder <ayankuma@xxxxxxx>
  • Date: Fri, 31 Mar 2023 15:24:15 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hNXy4+uNae/wj0KYsQCSgQVNoWDjOYKOm9OrEgS8sro=; b=cv0OfzXKkAk9l5BapTp6NcTXrc9YDNQMLsKvrUD4+cbQf+x00ntnuIKMKxxgG+M6ODjdbM4IhOJrSgK0+dTHG8bBrkL6BvU4okcSxtuF479Lpvr94NpYvW7++zzMnjWr+THD+MCLxCtuaTU2IINmpq2HHmi0xPLL9jFMkaPGXC8iSZBld4YDdo98Z7BxaNGlYWFBSClFqTc6mOLIb1hhn8tivoq5tYaya39XR1lnCFiZIgCZWzkzMXEd5QgyotXHjn9LRUckCDDirdo7zoms3e1qO1cA2FcfiZQsueYmeYvn3mCpjDwK7ojQ2hpo1qtLwJFr5r987X7TYVAgxFr62w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=E8Lc/baoLl3jEjTmJj4wYbipweqFRmdV3SfUk7Mg7gdQKVDmUM0lFD/2WrqGDhqKCiD5gxYZ0+8RDgeK+XI3HdmWqLr5IiLGfWTkPIh+1uyaHoYBTg4v7P7CpLmIrtrPVthiUr4Sr8dImUgP75fGrJa4v/rlqUboEVkaNWnOhD8oH5Quukmx2H4oQdGRicAi/ktELAwZXEaP/0IcGE7KcMoGQE2m/FSip0mrhwiNzGRfNbPKSVbMoLN3y0Obn+yLOALeV90st72FnoqvD/tf6wBxGMEpWMcZbRFIAQpdqpJmtY1OprsKcSeSn3U2HFdwRo/sNEtNyhw0hEpmCT/oBA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com;
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, "Stabellini, Stefano" <stefano.stabellini@xxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, "volodymyr_Babchuk@xxxxxxxx" <Volodymyr_Babchuk@xxxxxxxx>, Stewart Hildebrand <stewart.hildebrand@xxxxxxx>, "Garhwal, Vikram" <vikram.garhwal@xxxxxxx>
  • Delivery-date: Fri, 31 Mar 2023 14:24:30 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>


On 30/03/2023 17:19, Julien Grall wrote:
CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.


On 30/03/2023 16:57, Ayan Kumar Halder wrote:
(Apologies, fixed the formatting issue)

Hi,

Hi Julien,

Appreciate your inputs.



On 30/03/2023 16:50, Ayan Kumar Halder wrote:
Hi Xen experts,

I need some pointers on an issue I am facing.

I am running my downstream port of Xen on Cortex-R52 hardware. The
hardware consist of two R52 cores (the second core is in lockstep
mode). So, currently the hardware does not support SMP.

The issue is that Xen is unable to schedule a guest.

Are you sure about this? Because...

So
leave_hypervisor_to_guest() ---> check_for_pcpu_work() and this does
not return.

... leave_hypervisor_to_guest() indicates that a guest vCPU was
scheduled. Before we return to the guest we always check if there is
some softirq that need to be handled. So...


Debugging this, I see  check_for_pcpu_work() --> do_softirq() -->
__do_softirq() --> timer_softirq_action().

.. the fact that check_for_pcpu_work() doesn't return seems to indicate
that there is a softirq that is always pending. Can you look which one
it is?

Yes, it is the SCHEDULE_SOFTIRQ which is always pending. What happens is in timer_softirq_action()

    /* Execute ready list timers. */
    while ( ((t = ts->list) != NULL) && (t->expires < now) )
    {
        ts->list = t->list_next;
        execute_timer(ts, t);  <<<---- This invokes s_timer_fn() --->raise_softirq(SCHEDULE_SOFTIRQ)
    }

In the next and all subsequent iterations of timer_softirq_action, the following gets executed

    while ( (heap_metadata(heap)->size != 0) &&
            ((t = heap[1])->expires < now) )
    {
        remove_from_heap(heap, t);
        execute_timer(ts, t);  /* again raises SCHEDULE_SOFTIRQ */
    }

Thus, SCHEDULE_SOFTIRQ is always active.

So I tried to debug how "t->expires" is set.

It gets set from set_timer() which is invoked from do_schedule().

"set_timer(&sr->s_timer, now + prev->next_time);"

And "prev->next_time" is set from csched2_schedule()

"currunit->next_time = csched2_runtime(ops, sched_cpu, snext, now);"

So, looking at the timer logs .......

timer_softirq_action; 517: t->expires = 0x1415dc73b8  now = 0x14f3b757e8

timer_softirq_action; 509: expires = 0x169ea47ae0 now = 0x17cfb0ac70

timer_softirq_action; 509: expires = 0x1967f86738 now = 0x1a990325b0

timer_softirq_action; 509: expires = 0x1c3149e290 now = 0x1d6254c818

I tried a hack as follows in credit2.c :-

@@ -3740,7 +3744,9 @@ static void cf_check csched2_schedule(
     /*
      * Return task to run next...
      */
-    currunit->next_time = csched2_runtime(ops, sched_cpu, snext, now);
+    currunit->next_time = (0x200000000 + csched2_runtime(ops, sched_cpu, snext, now));

With this, the softirqs get cleared. Also reprogram_timer() get invoked with non zero deadline and I get timer interrupts.

So, it seems I need to debug csched2_runtime() and see why "currunit->next_time" is not set correctly.

- Ayan



In timer_softirq_action(), the problem I see is that for all the
timers, "((t = heap[1])->expires < now)" is true.

    while ( (heap_metadata(heap)->size != 0) &&
            ((t = heap[1])->expires < now) )
    {
        remove_from_heap(heap, t); <<<<------ So, this gets invoked
for all the timers.
        execute_timer(ts, t);
    }

So, further below reprogram_timer() gets invoked with timeout = 0 and
it disables the timer. So, timer_interrupt() is never invoked.

Which is expected because there is no timer armed, so there is no need
to configure the physical timer.


Can someone give any pointers on what the underlying issue could be
and how to debug further ?

See above. You could also look why there is no softtimer pending and/or
where Xen is stuck (e.g. the PC).


I do not observe this behavior while running on R52 FVP. The
difference is that for most of the timers "((t = heap[1])->expires <
now)" is false, so reprogram_timer() gets invoked with non zero
timeout and subsequently, the timer_interrupt() get invoked.
This reads as one of the following:
  1) There is a missing barrier
  2) You didn't properly configure some registers
  3) There is an HW errata (I know that some of the Cortex-A had an
issue in when reading the Timer counter but seems unlikely here)


Also, looking at the following from xen/arch/arm/time.c.

/* Set the timer to wake us up at a particular time.
  * Timeout is a Xen system time (nanoseconds since boot); 0 disables
the timer.
  * Returns 1 on success; 0 if the timeout is too soon or is in the
past. */
int reprogram_timer(s_time_t timeout)
{
     uint64_t deadline;

     if ( timeout == 0 )
     {
         WRITE_SYSREG(0, CNTHP_CTL_EL2);
         return 1; <<<<<<<<<<<<<<<<<<<<<<-------------- Shouldn't this
be 0 as the comment suggets ?

I think 1 is correct because we want to disable the timer so this is a
success. 0 should be return if this was effectively a timeout.

FWIW, x86 also seems to return 1 when the timeout is 0.

     }

     deadline = ns_to_ticks(timeout) + boot_count;
     WRITE_SYSREG64(deadline, CNTHP_CVAL_EL2);
     WRITE_SYSREG(CNTx_CTL_ENABLE, CNTHP_CTL_EL2);
     isb();

     /* No need to check for timers in the past; the Generic Timer fires
      * on a signed 63-bit comparison. */
     return 1;
}

Kind regards,

Ayan





--
Julien Grall




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.