[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v18 2/2] tools/libxl: VM forking toolstack side



On Thu, May 14, 2020 at 8:52 AM Ian Jackson <ian.jackson@xxxxxxxxxx> wrote:
>
> Tamas K Lengyel writes ("[PATCH v18 2/2] tools/libxl: VM forking toolstack 
> side"):
> > Add necessary bits to implement "xl fork-vm" commands. The command allows 
> > the
> > user to specify how to launch the device model allowing for a late-launch 
> > model
> > in which the user can execute the fork without the device model and decide 
> > to
> > only later launch it.
>
> Hi.
>
> Sorry to be so late in reviewing this.  I will divert my main
> attention to the API elements...
>
> > +=item B<fork-vm> [I<OPTIONS>] I<domain-id>
> > +
> > +Create a fork of a running VM.  The domain will be paused after the 
> > operation
> > +and remains paused while forks of it exist.
>
> Do you mean "must remain paused" ?  And "The original domain" rather
> than "The domain" ?

Yes, I mean the original domain.

>
> > +B<OPTIONS>
> > +
> > +=over 4
> > +
> > +=item B<-p>
> > +
> > +Leave the fork paused after creating it.
>
> By default the fork runs right away, then, I take it.

Same route is taken as when you run "xl restore" so yes. This applies
if you are launching the device model. Without the device model launch
we are not going down the same path as "xl restore" so the fork is
paused in that case.

>
> > +=item B<--launch-dm>
> > +
> > +Specify whether the device model (QEMU) should be launched for the fork. 
> > Late
> > +launch allows to start the device model for an already running fork.
>
> It's not clear to me whether this launches the DM for an existing
> fork, or specify when forking that the DM should be run ?

It's possible to do both. You can create a fork and launch the device
model for it right away, or you can create a fork, unpause it, and
only launch the device model when its actually necessary.

>
> Do you really mean that you can run a fork for a while with no DM ?
> How does that work ?
>
> Also you seem to have not documented the launch-dm operation ?

It's possible I missed it.

>
> > +=item B<-C>
> > +
> > +The config file to use when launching the device model.  Currently 
> > required when
> > +launching the device model.  Most config settings MUST match the parent 
> > domain
> > +exactly, only change VM name, disk path and network configurations.
>
> This is a libxl config file, right ?

Yes.

>
> > +=item B<-Q>
> > +
> > +The path to the qemu save file to use when launching the device model.  
> > Currently
> > +required when launching the device model.
>
> Where would the user get one of these ?

Generate it by connecting to the qmp socket of the parent domain and
issuing the command saves it. See the cover letter to the series. I
stopped sending the cover letter since there is only this one
outstanding patch now.

>
> I think this question has no good answer and this reveals a problem
> with the API...

I don't know what "problem" you are referring to. We deliberately
chose not to include saving the qemu save file every time a fork is
made because for our usecase you only need to generate the qemu save
file once. Doing it for every fork is huge waste of time since we are
spinning off forks from the same state hundreds of thousands of time.
No need to regenerate the same save file for each.

>
> > +=item B<--fork-reset>
> > +
> > +Perform a reset operation of an already running fork.  Note that resetting 
> > may
> > +be less performant then creating a new fork depending on how much memory 
> > the
> > +fork has deduplicated during its runtime.
>
> What is the semantic effect of a reset ?

I don't understand the question.

>
> > +=item B<--max-vcpus>
> > +
> > +Specify the max-vcpus matching the parent domain when not launching the dm.
>
> What ?  This makes little sense to me.  You specify vm-fork
> --max-vcpus and it changes the parent's max-vcpus ??

No. You need the max-vcpus value when you create a fork. The domain
create hypercall needs it to set the domain up. I originally wanted to
extend the domain create hypercall so this could be copied by the
hypervisor but the hypervisor maintainers were against changing that
hypercall. So we are left with having to pass it manually.

>
> > +=item B<--allow-iommu>
> > +
> > +Specify to allow forking a domain that has IOMMU enabled. Only compatible 
> > with
> > +forks using --launch-dm no.
>
> Are there no some complex implications here ?  Maybe this doc needs a
> caveat.

Only caveat is that this option is only available for forks that have
no device models launched for them. We use it for fuzzing the device
driver of the IOMMU device in the forks.

>
> > +int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config 
> > *d_config,
> > +                                uint32_t domid,
> > +                                const libxl_asyncprogress_how 
> > *aop_console_how)
> > +                                LIBXL_EXTERNAL_CALLERS_ONLY;
> > +
> > +int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid)
> > +                            LIBXL_EXTERNAL_CALLERS_ONLY;
> >  #endif
>
> I'm afraid I found the code very hard to review.  In particular:
>
> > -    if (!soft_reset) {
> > -        struct xen_domctl_createdomain create = {
> > -            .ssidref = info->ssidref,
> > -            .max_vcpus = b_info->max_vcpus,
> > -            .max_evtchn_port = b_info->event_channels,
> > -            .max_grant_frames = b_info->max_grant_frames,
> > -            .max_maptrack_frames = b_info->max_maptrack_frames,
>
> I think this containsw a lot of code motion.  There is probably some
> other refactoring.
>
>
> Can you please split this up into several patches ?  Code motion
> should cocur in patches that do nothing else.  Refactoring should be
> broken down into pieces as small as possible, and separated from the
> addition of new functionality.  So most of the patches should be
> annotated "no functional change".

I understand that this patch mixes code-movement and new functionality
which makes it harder for review. The code movement involves no
functional changes to existing code, only splitting some existing
functions to have parts of them reusable for the fork case and skip
the parts that are not relevant. Unfortunately I won't be able to
split this up into multiple patches as I found this codebase
impossibly hard to work with to begin with. It's callback hell with
like 15 different structures being created and freed left and right.
Add to it that it constantly changes even just in the last couple
months, so just rebasing it has been a massive pain. Forget about
backporting this patch too, everything else I was able to easily apply
to Xen 4.13, libxl is just an absolute no go since it has so much code
churn. So since this feature is not required for any of our use-cases
- this is only to make it possible for other potential users to use
forks where they need them to be more fully featured - I won't be able
to assign the time it would require to split this up further. It
probably would take me weeks which I can't justify at the moment.

Tamas



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.