[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Need help on a issue (Unable to schedule guest for Xen on Arm)
On 30/03/2023 17:19, Julien Grall wrote: CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.On 30/03/2023 16:57, Ayan Kumar Halder wrote:(Apologies, fixed the formatting issue)Hi, Hi Julien, Appreciate your inputs. On 30/03/2023 16:50, Ayan Kumar Halder wrote:Hi Xen experts, I need some pointers on an issue I am facing. I am running my downstream port of Xen on Cortex-R52 hardware. The hardware consist of two R52 cores (the second core is in lockstep mode). So, currently the hardware does not support SMP. The issue is that Xen is unable to schedule a guest.Are you sure about this? Because...Soleave_hypervisor_to_guest() ---> check_for_pcpu_work() and this does not return.... leave_hypervisor_to_guest() indicates that a guest vCPU was scheduled. Before we return to the guest we always check if there is some softirq that need to be handled. So...Debugging this, I see check_for_pcpu_work() --> do_softirq() --> __do_softirq() --> timer_softirq_action()... the fact that check_for_pcpu_work() doesn't return seems to indicate that there is a softirq that is always pending. Can you look which one it is? Yes, it is the SCHEDULE_SOFTIRQ which is always pending. What happens is in timer_softirq_action() /* Execute ready list timers. */ while ( ((t = ts->list) != NULL) && (t->expires < now) ) { ts->list = t->list_next;execute_timer(ts, t); <<<---- This invokes s_timer_fn() --->raise_softirq(SCHEDULE_SOFTIRQ) }In the next and all subsequent iterations of timer_softirq_action, the following gets executed while ( (heap_metadata(heap)->size != 0) && ((t = heap[1])->expires < now) ) { remove_from_heap(heap, t); execute_timer(ts, t); /* again raises SCHEDULE_SOFTIRQ */ } Thus, SCHEDULE_SOFTIRQ is always active. So I tried to debug how "t->expires" is set. It gets set from set_timer() which is invoked from do_schedule(). "set_timer(&sr->s_timer, now + prev->next_time);" And "prev->next_time" is set from csched2_schedule() "currunit->next_time = csched2_runtime(ops, sched_cpu, snext, now);" So, looking at the timer logs ....... timer_softirq_action; 517: t->expires = 0x1415dc73b8 now = 0x14f3b757e8 timer_softirq_action; 509: expires = 0x169ea47ae0 now = 0x17cfb0ac70 timer_softirq_action; 509: expires = 0x1967f86738 now = 0x1a990325b0 timer_softirq_action; 509: expires = 0x1c3149e290 now = 0x1d6254c818 I tried a hack as follows in credit2.c :- @@ -3740,7 +3744,9 @@ static void cf_check csched2_schedule( /* * Return task to run next... */ - currunit->next_time = csched2_runtime(ops, sched_cpu, snext, now);+ currunit->next_time = (0x200000000 + csched2_runtime(ops, sched_cpu, snext, now)); With this, the softirqs get cleared. Also reprogram_timer() get invoked with non zero deadline and I get timer interrupts. So, it seems I need to debug csched2_runtime() and see why "currunit->next_time" is not set correctly. - Ayan In timer_softirq_action(), the problem I see is that for all the timers, "((t = heap[1])->expires < now)" is true. while ( (heap_metadata(heap)->size != 0) && ((t = heap[1])->expires < now) ) { remove_from_heap(heap, t); <<<<------ So, this gets invoked for all the timers. execute_timer(ts, t); } So, further below reprogram_timer() gets invoked with timeout = 0 and it disables the timer. So, timer_interrupt() is never invoked.Which is expected because there is no timer armed, so there is no need to configure the physical timer.Can someone give any pointers on what the underlying issue could be and how to debug further ?See above. You could also look why there is no softtimer pending and/or where Xen is stuck (e.g. the PC).I do not observe this behavior while running on R52 FVP. The difference is that for most of the timers "((t = heap[1])->expires < now)" is false, so reprogram_timer() gets invoked with non zero timeout and subsequently, the timer_interrupt() get invoked.This reads as one of the following: 1) There is a missing barrier 2) You didn't properly configure some registers 3) There is an HW errata (I know that some of the Cortex-A had an issue in when reading the Timer counter but seems unlikely here)Also, looking at the following from xen/arch/arm/time.c./* Set the timer to wake us up at a particular time. * Timeout is a Xen system time (nanoseconds since boot); 0 disables the timer. * Returns 1 on success; 0 if the timeout is too soon or is in the past. */ int reprogram_timer(s_time_t timeout) { uint64_t deadline; if ( timeout == 0 ) { WRITE_SYSREG(0, CNTHP_CTL_EL2); return 1; <<<<<<<<<<<<<<<<<<<<<<-------------- Shouldn't this be 0 as the comment suggets ?I think 1 is correct because we want to disable the timer so this is a success. 0 should be return if this was effectively a timeout. FWIW, x86 also seems to return 1 when the timeout is 0.} deadline = ns_to_ticks(timeout) + boot_count; WRITE_SYSREG64(deadline, CNTHP_CVAL_EL2); WRITE_SYSREG(CNTx_CTL_ENABLE, CNTHP_CTL_EL2); isb(); /* No need to check for timers in the past; the Generic Timer fires * on a signed 63-bit comparison. */ return 1; }Kind regards, Ayan-- Julien Grall
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |