[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] question about migration



On 12/24/2015 08:36 PM, Andrew Cooper wrote:
> On 24/12/15 02:29, Wen Congyang wrote:
>> Hi Andrew Cooper:
>>
>> I rebase the COLO codes to the newest upstream xen, and test it. I found
>> a problem in the test, and I can reproduce this problem via the migration.
>>
>> How to reproduce:
>> 1. xl cr -p hvm_nopv
>> 2. xl migrate hvm_nopv 192.168.3.1
> 
> You are the very first person to try a usecase like this.
> 
> It works as much as it does because of your changes to the uncooperative HVM 
> domain logic.  I have said repeatedly during review, this is not necessarily 
> a safe change to make without an in-depth analysis of the knock-on effects; 
> it looks as if you have found the first knock-on effect.
> 
>>
>> The migration successes, but the vm doesn't run in the target machine.
>> You can get the reason from 'xl dmesg':
>> (XEN) HVM2 restore: VMCE_VCPU 1
>> (XEN) HVM2 restore: TSC_ADJUST 0
>> (XEN) HVM2 restore: TSC_ADJUST 1
>> (d2) HVM Loader
>> (d2) Detected Xen v4.7-unstable
>> (d2) Get guest memory maps[128] failed. (-38)
>> (d2) *** HVMLoader bug at e820.c:39
>> (d2) *** HVMLoader crashed.
>>
>> The reason is that:
>> We don't call xc_domain_set_memory_map() in the target machine.
>> When we create a hvm domain:
>> libxl__domain_build()
>>      libxl__build_hvm()
>>          libxl__arch_domain_construct_memmap()
>>              xc_domain_set_memory_map()
>>
>> Should we migrate the guest memory from source machine to target machine?
> 
> This bug specifically is because HVMLoader is expected to have run and turned 
> the hypercall information in an E820 table in the guest before a migration 
> occurs.
> 
> Unfortunately, the current codebase is riddled with such assumption and 
> expectations (e.g. the HVM save code assumed that FPU context is valid when 
> it is saving register state) which is a direct side effect of how it was 
> developed.
> 
> 
> Having said all of the above, I agree that your example is a usecase which 
> should work.  It is the ultimate test of whether the migration stream 
> contains enough information to faithfully reproduce the domain on the far 
> side.  Clearly at the moment, this is not the case.
> 
> I have an upcoming project to work on the domain memory layout logic, because 
> it is unsuitable for a number of XenServer usecases. Part of that will 
> require moving it in the migration stream.

I found another migration problem in the test:
If the migration fails, we will resume it in the source side.
But the hvm guest doesn't response any more.

In my test envirionment, the migration always successses, so I
use a hack way to reproduce it:
1. modify the target xen tools:

diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 258dec4..da95606 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void 
*dcs_void,
         goto err;
     }
 
+    rc = ERROR_FAIL;
+
  err:
     check_all_finished(egc, stream, rc);
 
2. xl cr hvm_nopv, and wait some time(You can login to the guest)
3. xl migrate hvm_nopv 192.168.3.1

The reason it that:
We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN.
It means that: the problem occurs only when the migration fails after we get
the hvm param HVM_PARAM_IOREQ_PFN.

In the function hvm_select_ioreq_server()
If the I/O will be handed by non-default ioreq server, we will return the
non-default ioreq server. In this case, it is handed by qemu.
If the I/O will not be handed by non-default ioreq server, we will return
the default ioreq server. Before migration, we return NULL, and after migration
it is not NULL. 
See the caller is hvmemul_do_io():
    case X86EMUL_UNHANDLEABLE:
    {
        struct hvm_ioreq_server *s =
            hvm_select_ioreq_server(curr->domain, &p);

        /* If there is no suitable backing DM, just ignore accesses */
        if ( !s )
        {
            rc = hvm_process_io_intercept(&null_handler, &p);
            vio->io_req.state = STATE_IOREQ_NONE;
        }
        else
        {
            rc = hvm_send_ioreq(s, &p, 0);
            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
                vio->io_req.state = STATE_IOREQ_NONE;
            else if ( data_is_addr )
                rc = X86EMUL_OKAY;
        }
        break;

We send the I/O request to the default I/O request server, but no backing
DM hands it. We wil wait the I/O forever......

Thanks
Wen Congyang

> 
> ~Andrew
> 
> 
> .
> 




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.