[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 1/4] xen/arm: Implement PSCI SYSTEM_SUSPEND call for guests



On Thu, Aug 7, 2025 at 10:38 AM Mykola Kvach <xakep.amatop@xxxxxxxxx> wrote:
>
> Hi Jan,
>
> On Tue, Jul 29, 2025 at 12:11 PM Jan Beulich <jbeulich@xxxxxxxx> wrote:
> >
> > On 29.07.2025 10:52, Mykola Kvach wrote:
> > > --- a/xen/common/domain.c
> > > +++ b/xen/common/domain.c
> > > @@ -1334,16 +1334,15 @@ int domain_shutdown(struct domain *d, u8 reason)
> > >      return 0;
> > >  }
> > >
> > > -void domain_resume(struct domain *d)
> > > +#ifdef CONFIG_ARM
> > > +void domain_resume_nopause(struct domain *d)
> > > +#else
> > > +static void domain_resume_nopause(struct domain *d)
> > > +#endif
> >
> > #ifndef CONFIG_ARM
> > static
> > #endif
> > void domain_resume_nopause(struct domain *d)
> >
> > to have as little redundancy as possible.
>
> Okay, I’ll change it.
>
> >
> > >  {
> > >      struct vcpu *v;
> > >
> > > -    /*
> > > -     * Some code paths assume that shutdown status does not get reset 
> > > under
> > > -     * their feet (e.g., some assertions make this assumption).
> > > -     */
> > > -    domain_pause(d);
> > > -
> > > +    domain_lock(d);
> >
> > This addition of locking affects domain_resume() as well. Neither need nor
> > correctness are discussed in the description. If the locking was really
> > needed for domain_resume() as well, I suppose adding that would better be
> > a separate change anyway.
>
> Thanks for the review.
>
> The locking was added to avoid potential races involving _VPF_suspended and
> the suspend/resume logic.
>
> Consider the case where domain_lock() is not used inside domain_resume():
>
> Domain 1 initiates suspend via PSCI SYSTEM_SUSPEND. At the same time,
> Domain 0 invokes resume for Domain 1.
>
> Domain 0 calls xl resume, which leads to domain_resume(). Domain 1 acquires
> domain_lock() as part of the suspend path. Then it acquires the shutdown
> lock in domain_shutdown(). Domain 0 is blocked waiting for the shutdown
> lock. When Domain 1 releases the shutdown lock, it sets _VPF_suspended and
> modifies the VCPU context. Then Domain 0 clears _VPF_suspended.
>
> At this point, ctxt_switch_from() might be called with _VPF_suspended
> already cleared, and the VCPU context partially updated. For example, the
> guest PC is set to the resume entry point, but some registers like TTBR or
> SCTLR_EL1 are saved from the current hardware context by
> ctxt_switch_from.
>
> However, after reviewing the flow again, I think this kind of race could
> still happen even with the lock in place. Imagine Domain 1 sets the flag
> via SYSTEM_SUSPEND, and then Domain 0 clears it by resuming the domain
> before the first context switch. This could still result in a partially
> updated context with inconsistent state.

There are no synchronization issues here -- domain_pause inside domain_resume
prevents them by design.

The only situation where issues might arise is during a SYSTEM_SUSPEND request
for a guest that has multiple vCPUs online (buggy OS), while another
vCPU performs
a CPU_ON request.

Therefore, it seems we only need to protect the loop that checks if other vCPUs
are offline during the SYSTEM_SUSPEND vPSCI call using the domain lock.

>
> So it might be better to update the VCPU context at the point of resume
> instead of doing it during suspend. I'll look into that further and also
> check for other possible races if the update is moved.
>
> >
> > The addition of this locking is particularly interesting considering that
> > ...
> >
> > >      spin_lock(&d->shutdown_lock);
> >
> > ... is what follows right after.
> >
> > > --- a/xen/include/xen/sched.h
> > > +++ b/xen/include/xen/sched.h
> > > @@ -814,6 +814,9 @@ void domain_destroy(struct domain *d);
> > >  int domain_kill(struct domain *d);
> > >  int domain_shutdown(struct domain *d, u8 reason);
> > >  void domain_resume(struct domain *d);
> > > +#ifdef CONFIG_ARM
> > > +void domain_resume_nopause(struct domain *d);
> > > +#endif
> > >
> > >  int domain_soft_reset(struct domain *d, bool resuming);
> > >
> > > @@ -1010,6 +1013,10 @@ static inline struct domain 
> > > *next_domain_in_cpupool(
> > >  /* VCPU is parked. */
> > >  #define _VPF_parked          8
> > >  #define VPF_parked           (1UL<<_VPF_parked)
> > > +/* VCPU is suspended. */
> > > +#define _VPF_suspended       9
> > > +#define VPF_suspended        (1UL<<_VPF_suspended)
> >
> > Irrespective of the style violations in pre-existing code, can you please
> > not add further violations, by inserting the missing blanks?
>
> Okay
>
> >
> > > +
> > >
> >
> > Please also don't introduce double blank lines.
>
> I'll remove it.
>
> >
> > Jan
>
> Best regards,
> Mykola



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.