[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/1] xc_domain_restore: Allow QEMU to increase memory



On 14/04/15 10:22, Hongyang Yang wrote:
>
>
> On 04/14/2015 04:53 PM, Andrew Cooper wrote:
>> On 14/04/15 00:51, Don Slutz wrote:
>>> On 04/13/15 12:25, Andrew Cooper wrote:
>>>> On 13/04/15 17:09, Don Slutz wrote:
>>>>> If QEMU has called on xc_domain_setmaxmem to add more memory for
>>>>> option ROMs, domain restore needs to also increase the memory.
>>>>>
>>>>> Signed-off-by: Don Slutz <dslutz@xxxxxxxxxxx>
>>>> hvmloader has no interaction with xc_domain_restore().
>>> I did not mean to imply that it did.  Somehow I lost the fact that this
>>> is a bug in master:
>>>
>>> [root@dcs-xen-52 don]# xl cre -p /home/don/e1000x8.xfg
>>> Parsing config from /home/don/e1000x8.xfg
>>> got a tsc mode string: "native_paravirt"
>>> [root@dcs-xen-52 don]# xl save e1000x8 e1000x8.save
>>> Saving to e1000x8.save new xl format (info 0x1/0x0/3506)
>>> xc: Saving memory: iter 0 (last sent 0 skipped 0): 1044481/1044481 
>>> 100%
>>> [root@dcs-xen-52 don]# xl restore e1000x8.save
>>> Loading new save file e1000x8.save (new xl fmt info 0x1/0x0/3506)
>>>   Savefile contains xl domain config in JSON format
>>> Parsing config from <saved>
>>> xc: error: Failed to allocate memory for batch.!: Internal error
>>> libxl: error: libxl_create.c:1057:libxl__xc_domain_restore_done:
>>> restoring domain: Cannot allocate memory
>>> libxl: error: libxl_create.c:1129:domcreate_rebuild_done: cannot
>>> (re-)build domain: -3
>>> libxl: error: libxl.c:1576:libxl__destroy_domid: non-existant domain 2
>>> libxl: error: libxl.c:1540:domain_destroy_callback: unable to destroy
>>> guest with domid 2
>>> libxl: error: libxl_create.c:1490:domcreate_destruction_cb: unable to
>>> destroy domain 2 following failed creation
>>> [root@dcs-xen-52 don]#
>>>
>>> The hvmloader part turns out to be a "warning message" that is ok to
>>> ignore.  However "xl save" followed by "xl restore" is currently broken
>>> without some fix.
>>>
>>>> It is xl's job to propagate the current memory as part of
>>>> migration.  If
>>>> qemu has had to bump it up, this should be reflected in the domain
>>>> config file.
>>> Not at all sure how QEMU (either in dom0 or a driver domain) is
>>> expected
>>> to change a file (the domain config file).  My guess is that you are
>>> saying that "xl save" (before xc_domain_save) is the correct place to
>>> record (or add extra info).   However it looks to me that all the data
>>> needed is in the save file, but
>>> I could be wrong.
>>>
>>> The page data is correctly saved into the save file (migration stream).
>>> However when
>>> the new domain is created, it's size is "wrong".  You cannot adjust any
>>> of the config info to fix the size, because the option ROM space to
>>> reserve is not an existing config option.   So if I am following you
>>> correctly, libxl needs to add and process more info to handle this
>>> case.
>>>
>>>> The migration path should not be hacked up to fix a higher level
>>>> toolstack bug.
>>> I do not see how QEMU is part of the "higher level toolstack".  If you
>>> mean xl vs xc, then
>>> I can see "xl save" some how doing the work.  This patch works for me,
>>> but I am happy to
>>> explore other ways to fix this bug.
>>
>> If I understand correctly, the steps are this:
>>
>> * 'xl create' makes a VM of size $FOO
>> * qemu bumps the size to $FOO+$N
>> * 'xl save' writes $FOO+$N of page data, but the xl config file at the
>> start of the image still says $FOO
>> * 'xl restore' creates a vm of size $FOO, then instructs
>> xc_domain_restore() to put $FOO+$N pages into it.
>>
>> I would argue first, that qemu should not play in this area to start
>> with.
>>
>> However, the real bug here is that the domain configuration written by
>> xl save is inaccurate.
>
> There's a case like COLO:
> 1. Both Primary/Secondary VM are created first with the same config file
>    which makes a VM of size $FOO
> 2. qemu bumps the size to $FOO+$N
> 3. 'save' writes $FOO+$N of page data
> 4. 'restore' put $FOO+$N pages into $FOO pages which will cause error
>
> Even if you fix the configuration, the bug still exists because the
> config
> only been transferred from Primary to Secondary once at the very
> beginning
> when you execute 'xl remus' command. The problem is how to let the
> secondary
> (restore) side knows the size change? Through a migration
> command(which is
> easier in v2 migration) or some other solution?
> Form this point of view, I think Don's solution is one way to solve the
> problem.

Funny you should ask that.  Migrationv2 for libxl moves the JSON config
blob into the libxl stream, rather than being a singleshot action at the
very start.  From that point, it becomes plausible to send a new JSON
blob when an update on the source side occurs.

However, I am still firmly of the opinion that is an xl/libxl bug to be
fixed at that level, not hacked around in the restore code.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.