[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 1/5] x86/HVM: split restore state checking from state loading



On Tue, Dec 19, 2023 at 04:24:02PM +0100, Jan Beulich wrote:
> On 19.12.2023 15:36, Roger Pau Monné wrote:
> > On Mon, Dec 18, 2023 at 03:39:55PM +0100, Jan Beulich wrote:
> >> ..., at least as reasonably feasible without making a check hook
> >> mandatory (in particular strict vs relaxed/zero-extend length checking
> >> can't be done early this way).
> >>
> >> Note that only one of the two uses of "real" hvm_load() is accompanied
> >> with a "checking" one. The other directly consumes hvm_save() output,
> >> which ought to be well-formed. This means that while input data related
> >> checks don't need repeating in the "load" function when already done by
> >> the "check" one (albeit assertions to this effect may be desirable),
> >> domain state related checks (e.g. has_xyz(d)) will be required in both
> >> places.
> >>
> >> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> >> ---
> >> Now that this re-arranges hvm_load() anyway, wouldn't it be better to
> >> down the vCPU-s ahead of calling arch_hvm_load() (which is now easy to
> >> arrange for)?
> > 
> > Seems OK to me.
> 
> As is, or with the suggested adjustment, or either way?

I'm fine either way if you don't want to do it as part of this
patch.

> >> --- a/xen/arch/x86/domctl.c
> >> +++ b/xen/arch/x86/domctl.c
> >> @@ -379,8 +379,12 @@ long arch_do_domctl(
> >>          if ( copy_from_guest(c.data, domctl->u.hvmcontext.buffer, c.size) 
> >> != 0 )
> >>              goto sethvmcontext_out;
> >>  
> >> +        ret = hvm_load(d, false, &c);
> >> +        if ( ret )
> >> +            goto sethvmcontext_out;
> >> +
> >>          domain_pause(d);
> >> -        ret = hvm_load(d, &c);
> >> +        ret = hvm_load(d, true, &c);
> > 
> > Now that the check has been done ahead, do we want to somehow assert
> > that this cannot fail?  AIUI that's the expectation.
> 
> We certainly can't until all checking was moved out of the load handlers.
> And even then I think there are still cases where load might produce an
> error. (In fact I would have refused a little more strongly to folding
> the prior hvm_check() into hvm_load() if indeed a separate hvm_load()
> could have ended up returning void in the long run.)

I see, _load could fail even if all the data provided was correct, for
example because the hypervisor is OoM?

> >> @@ -275,12 +281,10 @@ int hvm_save(struct domain *d, hvm_domai
> >>      return 0;
> >>  }
> >>  
> >> -int hvm_load(struct domain *d, hvm_domain_context_t *h)
> >> +int hvm_load(struct domain *d, bool real, hvm_domain_context_t *h)
> > 
> > Maybe the 'real' parameter should instead be an enum:
> > 
> > enum hvm_load_action {
> >     CHECK,
> >     LOAD,
> > };
> > int hvm_load(struct domain *d, enum hvm_load_action action,
> >              hvm_domain_context_t *h);
> 
> Hmm, yes, it could. I'm not a fan of enums for boolean-like things,
> though.
> 
> > Otherwise a comment might be warranted about how 'real' affects the
> > logic in the function.
> 
> I can certainly add a comment immediately ahead of the function:
> 
> /*
>  * @real = false requests checking of the incoming state, while @real = true
>  * requests actual loading, which will then assume that checking was already
>  * done or is unnecessary.
>  */

Seems good to me.  I do think the usage of an action enum is clearer,
but I'm fine with the comment and the usage of a boolean.

> >> @@ -291,50 +295,91 @@ int hvm_load(struct domain *d, hvm_domai
> >>      if ( !hdr )
> >>          return -ENODATA;
> >>  
> >> -    rc = arch_hvm_load(d, hdr);
> >> -    if ( rc )
> >> -        return rc;
> >> +    rc = arch_hvm_check(d, hdr);
> > 
> > Shouldn't this _check function only be called when real == false?
> 
> Possibly. In v4 I directly transformed what I had in v3:
> 
>     ASSERT(!arch_hvm_check(d, hdr));
> 
> I.e. it is now the call above plus ...
> 
> >> +    if ( real )
> >> +    {
> >> +        struct vcpu *v;
> >> +
> >> +        ASSERT(!rc);
> 
> ... this assertion. Really the little brother of the call site assertion
> you're asking for (see above).
> 
> >> +        arch_hvm_load(d, hdr);
> >>  
> >> -    /* Down all the vcpus: we only re-enable the ones that had state 
> >> saved. */
> >> -    for_each_vcpu(d, v)
> >> -        if ( !test_and_set_bit(_VPF_down, &v->pause_flags) )
> >> -            vcpu_sleep_nosync(v);
> >> +        /*
> >> +         * Down all the vcpus: we only re-enable the ones that had state
> >> +         * saved.
> >> +         */
> >> +        for_each_vcpu(d, v)
> >> +            if ( !test_and_set_bit(_VPF_down, &v->pause_flags) )
> >> +                vcpu_sleep_nosync(v);
> >> +    }
> >> +    else if ( rc )
> >> +        return rc;

The issue I see with this is that when built with debug=n the call to
arch_hvm_check() with real == true is useless, as the result is never
evaluated - IOW: would be clearer to just avoid the call altogether.

> >>      for ( ; ; )
> >>      {
> >> +        const char *name;
> >> +        hvm_load_handler load;
> >> +
> >>          if ( h->size - h->cur < sizeof(struct hvm_save_descriptor) )
> >>          {
> >>              /* Run out of data */
> >>              printk(XENLOG_G_ERR
> >>                     "HVM%d restore: save did not end with a null entry\n",
> >>                     d->domain_id);
> >> +            ASSERT(!real);
> >>              return -ENODATA;
> >>          }
> >>  
> >>          /* Read the typecode of the next entry  and check for the 
> >> end-marker */
> >>          desc = (struct hvm_save_descriptor *)(&h->data[h->cur]);
> >> -        if ( desc->typecode == 0 )
> >> +        if ( desc->typecode == HVM_SAVE_CODE(END) )
> >> +        {
> >> +            /* Reset cursor for hvm_load(, true, ). */
> >> +            if ( !real )
> >> +                h->cur = 0;
> >>              return 0;
> >> +        }
> >>  
> >>          /* Find the handler for this entry */
> >> -        if ( (desc->typecode > HVM_SAVE_CODE_MAX) ||
> >> -             ((handler = hvm_sr_handlers[desc->typecode].load) == NULL) )
> >> +        if ( desc->typecode >= ARRAY_SIZE(hvm_sr_handlers) ||
> >> +             !(name = hvm_sr_handlers[desc->typecode].name) ||
> >> +             !(load = hvm_sr_handlers[desc->typecode].load) )
> >>          {
> >>              printk(XENLOG_G_ERR "HVM%d restore: unknown entry typecode 
> >> %u\n",
> >>                     d->domain_id, desc->typecode);
> > 
> > The message is not very accurate here, it does fail when the typecode
> > is unknown, but also fails when such typecode has no name or load
> > function setup.
> 
> Yes and no, and it's not changing in this patch. Are you suggesting I should
> change it despite being unrelated? If so, there not being a name (which is
> the new check I'm adding) still suggests the code is unknown. There not being
> a load handler really indicates a bug in Xen (yet no reason to e.g. BUG() in
> that case, the failed loading will hopefully be noticeable enough).

Right, so not a lot of room for improvement anyway.  Let's leave as-is
then.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.