[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] question about migration

On 12/24/2015 08:36 PM, Andrew Cooper wrote:
> On 24/12/15 02:29, Wen Congyang wrote:
>> Hi Andrew Cooper:
>> I rebase the COLO codes to the newest upstream xen, and test it. I found
>> a problem in the test, and I can reproduce this problem via the migration.
>> How to reproduce:
>> 1. xl cr -p hvm_nopv
>> 2. xl migrate hvm_nopv
> You are the very first person to try a usecase like this.
> It works as much as it does because of your changes to the uncooperative HVM 
> domain logic.  I have said repeatedly during review, this is not necessarily 
> a safe change to make without an in-depth analysis of the knock-on effects; 
> it looks as if you have found the first knock-on effect.
>> The migration successes, but the vm doesn't run in the target machine.
>> You can get the reason from 'xl dmesg':
>> (XEN) HVM2 restore: VMCE_VCPU 1
>> (XEN) HVM2 restore: TSC_ADJUST 0
>> (XEN) HVM2 restore: TSC_ADJUST 1
>> (d2) HVM Loader
>> (d2) Detected Xen v4.7-unstable
>> (d2) Get guest memory maps[128] failed. (-38)
>> (d2) *** HVMLoader bug at e820.c:39
>> (d2) *** HVMLoader crashed.
>> The reason is that:
>> We don't call xc_domain_set_memory_map() in the target machine.
>> When we create a hvm domain:
>> libxl__domain_build()
>>      libxl__build_hvm()
>>          libxl__arch_domain_construct_memmap()
>>              xc_domain_set_memory_map()
>> Should we migrate the guest memory from source machine to target machine?
> This bug specifically is because HVMLoader is expected to have run and turned 
> the hypercall information in an E820 table in the guest before a migration 
> occurs.
> Unfortunately, the current codebase is riddled with such assumption and 
> expectations (e.g. the HVM save code assumed that FPU context is valid when 
> it is saving register state) which is a direct side effect of how it was 
> developed.
> Having said all of the above, I agree that your example is a usecase which 
> should work.  It is the ultimate test of whether the migration stream 
> contains enough information to faithfully reproduce the domain on the far 
> side.  Clearly at the moment, this is not the case.
> I have an upcoming project to work on the domain memory layout logic, because 
> it is unsuitable for a number of XenServer usecases. Part of that will 
> require moving it in the migration stream.

I found another migration problem in the test:
If the migration fails, we will resume it in the source side.
But the hvm guest doesn't response any more.

In my test envirionment, the migration always successses, so I
use a hack way to reproduce it:
1. modify the target xen tools:

diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 258dec4..da95606 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void 
         goto err;
+    rc = ERROR_FAIL;
     check_all_finished(egc, stream, rc);
2. xl cr hvm_nopv, and wait some time(You can login to the guest)
3. xl migrate hvm_nopv

The reason it that:
We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN.
It means that: the problem occurs only when the migration fails after we get
the hvm param HVM_PARAM_IOREQ_PFN.

In the function hvm_select_ioreq_server()
If the I/O will be handed by non-default ioreq server, we will return the
non-default ioreq server. In this case, it is handed by qemu.
If the I/O will not be handed by non-default ioreq server, we will return
the default ioreq server. Before migration, we return NULL, and after migration
it is not NULL. 
See the caller is hvmemul_do_io():
        struct hvm_ioreq_server *s =
            hvm_select_ioreq_server(curr->domain, &p);

        /* If there is no suitable backing DM, just ignore accesses */
        if ( !s )
            rc = hvm_process_io_intercept(&null_handler, &p);
            vio->io_req.state = STATE_IOREQ_NONE;
            rc = hvm_send_ioreq(s, &p, 0);
            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
                vio->io_req.state = STATE_IOREQ_NONE;
            else if ( data_is_addr )
                rc = X86EMUL_OKAY;

We send the I/O request to the default I/O request server, but no backing
DM hands it. We wil wait the I/O forever......

Wen Congyang

> ~Andrew
> .

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.