Xen project Mailing List

Re: [Xen-devel] question about migration

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Wen Congyang <wency@xxxxxxxxxxxxxx>

Date: Fri, 25 Dec 2015 09:45:33 +0800

Delivery-date: Fri, 25 Dec 2015 01:45:54 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 12/24/2015 08:36 PM, Andrew Cooper wrote: > On 24/12/15 02:29, Wen Congyang wrote: >> Hi Andrew Cooper: >> >> I rebase the COLO codes to the newest upstream xen, and test it. I found >> a problem in the test, and I can reproduce this problem via the migration. >> >> How to reproduce: >> 1. xl cr -p hvm_nopv >> 2. xl migrate hvm_nopv 192.168.3.1 > > You are the very first person to try a usecase like this. > > It works as much as it does because of your changes to the uncooperative HVM > domain logic. I have said repeatedly during review, this is not necessarily > a safe change to make without an in-depth analysis of the knock-on effects; > it looks as if you have found the first knock-on effect. > >> >> The migration successes, but the vm doesn't run in the target machine. >> You can get the reason from 'xl dmesg': >> (XEN) HVM2 restore: VMCE_VCPU 1 >> (XEN) HVM2 restore: TSC_ADJUST 0 >> (XEN) HVM2 restore: TSC_ADJUST 1 >> (d2) HVM Loader >> (d2) Detected Xen v4.7-unstable >> (d2) Get guest memory maps[128] failed. (-38) >> (d2) *** HVMLoader bug at e820.c:39 >> (d2) *** HVMLoader crashed. >> >> The reason is that: >> We don't call xc_domain_set_memory_map() in the target machine. >> When we create a hvm domain: >> libxl__domain_build() >> libxl__build_hvm() >> libxl__arch_domain_construct_memmap() >> xc_domain_set_memory_map() >> >> Should we migrate the guest memory from source machine to target machine? > > This bug specifically is because HVMLoader is expected to have run and turned > the hypercall information in an E820 table in the guest before a migration > occurs. > > Unfortunately, the current codebase is riddled with such assumption and > expectations (e.g. the HVM save code assumed that FPU context is valid when > it is saving register state) which is a direct side effect of how it was > developed. > > > Having said all of the above, I agree that your example is a usecase which > should work. It is the ultimate test of whether the migration stream > contains enough information to faithfully reproduce the domain on the far > side. Clearly at the moment, this is not the case. > > I have an upcoming project to work on the domain memory layout logic, because > it is unsuitable for a number of XenServer usecases. Part of that will > require moving it in the migration stream. I found another migration problem in the test: If the migration fails, we will resume it in the source side. But the hvm guest doesn't response any more. In my test envirionment, the migration always successses, so I use a hack way to reproduce it: 1. modify the target xen tools: diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c index 258dec4..da95606 100644 --- a/tools/libxl/libxl_stream_read.c +++ b/tools/libxl/libxl_stream_read.c @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void, goto err; } + rc = ERROR_FAIL; + err: check_all_finished(egc, stream, rc); 2. xl cr hvm_nopv, and wait some time(You can login to the guest) 3. xl migrate hvm_nopv 192.168.3.1 The reason it that: We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN. It means that: the problem occurs only when the migration fails after we get the hvm param HVM_PARAM_IOREQ_PFN. In the function hvm_select_ioreq_server() If the I/O will be handed by non-default ioreq server, we will return the non-default ioreq server. In this case, it is handed by qemu. If the I/O will not be handed by non-default ioreq server, we will return the default ioreq server. Before migration, we return NULL, and after migration it is not NULL. See the caller is hvmemul_do_io(): case X86EMUL_UNHANDLEABLE: { struct hvm_ioreq_server *s = hvm_select_ioreq_server(curr->domain, &p); /* If there is no suitable backing DM, just ignore accesses */ if ( !s ) { rc = hvm_process_io_intercept(&null_handler, &p); vio->io_req.state = STATE_IOREQ_NONE; } else { rc = hvm_send_ioreq(s, &p, 0); if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down ) vio->io_req.state = STATE_IOREQ_NONE; else if ( data_is_addr ) rc = X86EMUL_OKAY; } break; We send the I/O request to the default I/O request server, but no backing DM hands it. We wil wait the I/O forever...... Thanks Wen Congyang > > ~Andrew > > > . > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.