[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v7 1/4] xen/arm: Implement PSCI SYSTEM_SUSPEND call for guests
On Thu, Aug 7, 2025 at 10:38 AM Mykola Kvach <xakep.amatop@xxxxxxxxx> wrote: > > Hi Jan, > > On Tue, Jul 29, 2025 at 12:11 PM Jan Beulich <jbeulich@xxxxxxxx> wrote: > > > > On 29.07.2025 10:52, Mykola Kvach wrote: > > > --- a/xen/common/domain.c > > > +++ b/xen/common/domain.c > > > @@ -1334,16 +1334,15 @@ int domain_shutdown(struct domain *d, u8 reason) > > > return 0; > > > } > > > > > > -void domain_resume(struct domain *d) > > > +#ifdef CONFIG_ARM > > > +void domain_resume_nopause(struct domain *d) > > > +#else > > > +static void domain_resume_nopause(struct domain *d) > > > +#endif > > > > #ifndef CONFIG_ARM > > static > > #endif > > void domain_resume_nopause(struct domain *d) > > > > to have as little redundancy as possible. > > Okay, I’ll change it. > > > > > > { > > > struct vcpu *v; > > > > > > - /* > > > - * Some code paths assume that shutdown status does not get reset > > > under > > > - * their feet (e.g., some assertions make this assumption). > > > - */ > > > - domain_pause(d); > > > - > > > + domain_lock(d); > > > > This addition of locking affects domain_resume() as well. Neither need nor > > correctness are discussed in the description. If the locking was really > > needed for domain_resume() as well, I suppose adding that would better be > > a separate change anyway. > > Thanks for the review. > > The locking was added to avoid potential races involving _VPF_suspended and > the suspend/resume logic. > > Consider the case where domain_lock() is not used inside domain_resume(): > > Domain 1 initiates suspend via PSCI SYSTEM_SUSPEND. At the same time, > Domain 0 invokes resume for Domain 1. > > Domain 0 calls xl resume, which leads to domain_resume(). Domain 1 acquires > domain_lock() as part of the suspend path. Then it acquires the shutdown > lock in domain_shutdown(). Domain 0 is blocked waiting for the shutdown > lock. When Domain 1 releases the shutdown lock, it sets _VPF_suspended and > modifies the VCPU context. Then Domain 0 clears _VPF_suspended. > > At this point, ctxt_switch_from() might be called with _VPF_suspended > already cleared, and the VCPU context partially updated. For example, the > guest PC is set to the resume entry point, but some registers like TTBR or > SCTLR_EL1 are saved from the current hardware context by > ctxt_switch_from. > > However, after reviewing the flow again, I think this kind of race could > still happen even with the lock in place. Imagine Domain 1 sets the flag > via SYSTEM_SUSPEND, and then Domain 0 clears it by resuming the domain > before the first context switch. This could still result in a partially > updated context with inconsistent state. There are no synchronization issues here -- domain_pause inside domain_resume prevents them by design. The only situation where issues might arise is during a SYSTEM_SUSPEND request for a guest that has multiple vCPUs online (buggy OS), while another vCPU performs a CPU_ON request. Therefore, it seems we only need to protect the loop that checks if other vCPUs are offline during the SYSTEM_SUSPEND vPSCI call using the domain lock. > > So it might be better to update the VCPU context at the point of resume > instead of doing it during suspend. I'll look into that further and also > check for other possible races if the update is moved. > > > > > The addition of this locking is particularly interesting considering that > > ... > > > > > spin_lock(&d->shutdown_lock); > > > > ... is what follows right after. > > > > > --- a/xen/include/xen/sched.h > > > +++ b/xen/include/xen/sched.h > > > @@ -814,6 +814,9 @@ void domain_destroy(struct domain *d); > > > int domain_kill(struct domain *d); > > > int domain_shutdown(struct domain *d, u8 reason); > > > void domain_resume(struct domain *d); > > > +#ifdef CONFIG_ARM > > > +void domain_resume_nopause(struct domain *d); > > > +#endif > > > > > > int domain_soft_reset(struct domain *d, bool resuming); > > > > > > @@ -1010,6 +1013,10 @@ static inline struct domain > > > *next_domain_in_cpupool( > > > /* VCPU is parked. */ > > > #define _VPF_parked 8 > > > #define VPF_parked (1UL<<_VPF_parked) > > > +/* VCPU is suspended. */ > > > +#define _VPF_suspended 9 > > > +#define VPF_suspended (1UL<<_VPF_suspended) > > > > Irrespective of the style violations in pre-existing code, can you please > > not add further violations, by inserting the missing blanks? > > Okay > > > > > > + > > > > > > > Please also don't introduce double blank lines. > > I'll remove it. > > > > > Jan > > Best regards, > Mykola
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |