[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH -next v4 06/19] arm64: entry: Move arm64_preempt_schedule_irq() into exit_to_kernel_mode()



On Fri, Oct 25, 2024 at 06:06:47PM +0800, Jinjie Ruan wrote:
> Move arm64_preempt_schedule_irq() into exit_to_kernel_mode(), so not
> only __el1_irq() but also every time when kernel mode irq return,
> there is a chance to reschedule.

We use exit_to_kernel_mode() for every non-NMI exception return to the
kernel, not just IRQ returns.

> As Mark pointed out, this change will have the following key impact:
> 
>     "We'll preempt even without taking a "real" interrupt. That
>     shouldn't result in preemption that wasn't possible before,
>     but it does change the probability of preempting at certain points,
>     and might have a performance impact, so probably warrants a
>     benchmark."

For anyone following along at home, I said that at:

  https://lore.kernel.org/linux-arm-kernel/ZxejvAmccYMTa4P1@J2N7QTR9R3/

... and there I specifically said:

> I's suggest you first write a patch to align arm64's entry code with the
> generic code, by removing the call to arm64_preempt_schedule_irq() from
> __el1_irq(), and adding a call to arm64_preempt_schedule_irq() in
> __exit_to_kernel_mode(), e.g.
> 
> | static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
> | {
> |     ...
> |     if (interrupts_enabled(regs)) {
> |             ...
> |             if (regs->exit_rcu) {
> |                     ...
> |             }
> |             ...
> |             arm64_preempt_schedule_irq();
> |             ...
> |     } else {
> |             ...
> |     }
> | }

[...]

> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> +#define need_irq_preemption() \
> +     (static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> +#else
> +#define need_irq_preemption()        (IS_ENABLED(CONFIG_PREEMPTION))
> +#endif
> +
> +static void __sched arm64_preempt_schedule_irq(void)
> +{
> +     if (!need_irq_preemption())
> +             return;
> +
> +     /*
> +      * Note: thread_info::preempt_count includes both thread_info::count
> +      * and thread_info::need_resched, and is not equivalent to
> +      * preempt_count().
> +      */
> +     if (READ_ONCE(current_thread_info()->preempt_count) != 0)
> +             return;
> +
> +     /*
> +      * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> +      * priority masking is used the GIC irqchip driver will clear DAIF.IF
> +      * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> +      * DAIF we must have handled an NMI, so skip preemption.
> +      */
> +     if (system_uses_irq_prio_masking() && read_sysreg(daif))
> +             return;
> +
> +     /*
> +      * Preempting a task from an IRQ means we leave copies of PSTATE
> +      * on the stack. cpufeature's enable calls may modify PSTATE, but
> +      * resuming one of these preempted tasks would undo those changes.
> +      *
> +      * Only allow a task to be preempted once cpufeatures have been
> +      * enabled.
> +      */
> +     if (system_capabilities_finalized())
> +             preempt_schedule_irq();
> +}
> +
>  /*
>   * Handle IRQ/context state management when exiting to kernel mode.
>   * After this function returns it is not safe to call regular kernel code,
> @@ -72,6 +114,8 @@ static noinstr irqentry_state_t 
> enter_from_kernel_mode(struct pt_regs *regs)
>  static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
>                                       irqentry_state_t state)
>  {
> +     arm64_preempt_schedule_irq();

This is broken; exit_to_kernel_mode() is called for any non-NMI return
excpetion return to the kernel, and this doesn't check that interrupts
were enabled in the context the exception was taken from.

This will preempt in cases where we should not, e.g. if we WARN() in a section 
with
IRQs disabled.

Mark.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.