[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC XEN PATCH] xen/arm: ffa: reclaim shared memory on guest destroy



Hi Andrew,

On Tue, Dec 5, 2023 at 11:53 AM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>
> On 05/12/2023 8:14 am, Bertrand Marquis wrote:
> > Hi Julien,
> >
> > Thanks a lot for your review and comment, this is very helpful.
> >
> >> On 4 Dec 2023, at 20:24, Julien Grall <julien@xxxxxxx> wrote:
> >>
> >> Hi Jens,
> >>
> >> On 04/12/2023 07:55, Jens Wiklander wrote:
> >>>        if ( ctx->rx )
> >>>          rxtx_unmap(ctx);
> >>>  +
> >>> +    list_for_each_entry_safe(shm, tmp, &ctx->shm_list, list)
> >>> +    {
> >>> +        register_t handle_hi;
> >>> +        register_t handle_lo;
> >>> +
> >>> +        uint64_to_regpair(&handle_hi, &handle_lo, shm->handle);
> >>> +        res = ffa_mem_reclaim(handle_lo, handle_hi, 0);
> >> Is this call expensive? If so, we may need to handle continuation here.
> > This call should not be expensive in the normal case as memory is 
> > reclaimable
> > so there is no processing required in the SP and all is done in the SPMC 
> > which
> > should basically just return a yes or no depending on a state for the 
> > handle.
> >
> > So I think this is the best trade.
> >
> > @Jens: One thing to consider is that a Destroy might get a retry or busy 
> > answer and we
> > will have to issue it again and this is not considered in the current 
> > implementation.
> >
> > After discussing the subject internally we could in fact consider that if 
> > an SP cannot release
> > some memory shared with the VM destroyed, it should tell it by returning 
> > "retry" to the message.
> > Here that could simplify things by doing a strategy where:
> > - we retry on the VM_DESTROY message if required
> > - if some memory is not reclaimable we check if we could park it and make 
> > the VM a zombie.
> > What do you think ?
>
> This is the cleanup issue discussed at XenSummit, isn't it?
>
> You cannot feasibly implement this cleanup by having
> ffa_domain_teardown() return -ERESTART.
>
> Yes, it will probably function - but now you're now bouncing in/out of
> Xen as fast as the CPU will allow, rechecking a condition which will
> take an unbounded quantity of time.  Meanwhile, you've tied up a
> userspace thread (the invocation of `xl destroy`) to do so, and one of
> dom0's vCPU for however long the scheduler is willing to schedule the
> destroy invocation, which will be 100% of the time as it's always busy
> in the hypervisor.
>
> The teardown/kill infrastructure is intended and expected to always make
> forward progress.
>

OK

>
> The closest thing to this patch which will work sanely is this:
>
> Hold a single domain reference for any non-zero amount of magic memory
> held.  See domain_adjust_tot_pages() and how it interacts with
> {get,put}_domain(), and copy it.  Importantly, this prevents the domain
> being freed until the final piece of magic memory has been released.
>
> Have some way (can be early on the teardown/kill path, or a separate
> hypercall - assuming the VM can't ever be scheduled again) to kick Xen
> into being responsible for trying to reclaim the memory.  (Start a
> timer, or reclaim in the idle loop, whatever.)
>
> This way, you can `xl destroy` a VM in an arbitrary state, *and* the
> invocation will terminate when Xen has nothing deterministic left to do,
> *and* in the case that the secure world or Xen has an issue, the VM will
> stay around as a zombie holding minimal resources.

Thanks for the pointers, very helpful, and now I at least know where
to start looking.

Cheers,
Jens

>
> ~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.