[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/3] Don't create default ioreq server
On Fri, Dec 09, 2016 at 04:43:58PM +0000, Paul Durrant wrote: > > -----Original Message----- > > From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of > > Konrad Rzeszutek Wilk > > Sent: 09 December 2016 16:14 > > To: Zhang Chen <zhangchen.fnst@xxxxxxxxxxxxxx>; Paul Durrant > > <Paul.Durrant@xxxxxxxxxx> > > Cc: Changlong Xie <xiecl.fnst@xxxxxxxxxxxxxx>; Wei Liu > > <wei.liu2@xxxxxxxxxx>; Eddie Dong <eddie.dong@xxxxxxxxx>; Andrew > > Cooper <Andrew.Cooper3@xxxxxxxxxx>; Ian Jackson > > <Ian.Jackson@xxxxxxxxxx>; Wen Congyang <wencongyang@xxxxxxxxx>; Paul > > Durrant <Paul.Durrant@xxxxxxxxxx>; Yang Hongyang > > <imhy.yang@xxxxxxxxx>; Xen devel <xen-devel@xxxxxxxxxxxxxxxxxxxx> > > Subject: Re: [Xen-devel] [PATCH 1/3] Don't create default ioreq server > > > > .snip.. > > > > If you can be more specific about what is broken in COLO we might be > > > > able to devise a fix for you. > > > > > > My workmate have reported this BUG last year: > > > https://lists.xenproject.org/archives/html/xen-devel/2015- > > 12/msg02850.html > > > > Paul, Andrew was asking about: > > > > This bug is caused by the read side effects of > > HVM_PARAM_IOREQ_PFN. The migration code needs a way of being able to > > query whether a default ioreq server exists, without creating one. > > > > Can you remember what the justification for the read side effects > > were? ISTR that it was only for qemu compatibility until the ioreq server > > work > > got in upstream. If that was the case, can we drop the read side effects now > > and mandate that all qemus explicitly create their ioreq servers (even if > > this > > involves creating a default ioreq server for qemu-trad)? > > > > The read side effects are indeed because of the need to support the old qemu > interface. If trad were patched then we could at least deprecate the default > ioreq server but I'm not sure how long we'd need to leave it in place after > that before it was removed. Perhaps it ought to be under a KCONFIG option, > since it's also a bit of a security hole. > So.. what can be done about to make COLO work? > Paul > > > > > ? > > > > Full thread below: > > > > [Top] [All Lists] > > [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread > > Index] > > Re: [Xen-devel] question about migration > > > > To: Wen Congyang <wency@xxxxxxxxxxxxxx> > > From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > Date: Tue, 29 Dec 2015 11:24:14 +0000 > > Cc: Paul Durrant <paul.durrant@xxxxxxxxxx>, xen devel <xen- > > devel@xxxxxxxxxxxxx> > > Delivery-date: Tue, 29 Dec 2015 11:24:33 +0000 > > List-id: Xen developer discussion <xen-devel.lists.xen.org> > > On 25/12/2015 01:45, Wen Congyang wrote: > > On 12/24/2015 08:36 PM, Andrew Cooper wrote: > > On 24/12/15 02:29, Wen Congyang wrote: > > Hi Andrew Cooper: > > > > I rebase the COLO codes to the newest upstream xen, and test it. I found > > a problem in the test, and I can reproduce this problem via the migration. > > > > How to reproduce: > > 1. xl cr -p hvm_nopv > > 2. xl migrate hvm_nopv 192.168.3.1 > > You are the very first person to try a usecase like this. > > > > It works as much as it does because of your changes to the uncooperative > > HVM > > domain logic. I have said repeatedly during review, this is not > > necessarily a > > safe change to make without an in-depth analysis of the knock-on effects; it > > looks as if you have found the first knock-on effect. > > > > The migration successes, but the vm doesn't run in the target machine. > > You can get the reason from 'xl dmesg': > > (XEN) HVM2 restore: VMCE_VCPU 1 > > (XEN) HVM2 restore: TSC_ADJUST 0 > > (XEN) HVM2 restore: TSC_ADJUST 1 > > (d2) HVM Loader > > (d2) Detected Xen v4.7-unstable > > (d2) Get guest memory maps[128] failed. (-38) > > (d2) *** HVMLoader bug at e820.c:39 > > (d2) *** HVMLoader crashed. > > > > The reason is that: > > We don't call xc_domain_set_memory_map() in the target machine. > > When we create a hvm domain: > > libxl__domain_build() > > libxl__build_hvm() > > libxl__arch_domain_construct_memmap() > > xc_domain_set_memory_map() > > > > Should we migrate the guest memory from source machine to target > > machine? > > This bug specifically is because HVMLoader is expected to have run and > > turned > > the hypercall information in an E820 table in the guest before a migration > > occurs. > > > > Unfortunately, the current codebase is riddled with such assumption and > > expectations (e.g. the HVM save code assumed that FPU context is valid > > when it > > is saving register state) which is a direct side effect of how it was > > developed. > > > > > > Having said all of the above, I agree that your example is a usecase which > > should work. It is the ultimate test of whether the migration stream > > contains > > enough information to faithfully reproduce the domain on the far side. > > Clearly > > at the moment, this is not the case. > > > > I have an upcoming project to work on the domain memory layout logic, > > because > > it is unsuitable for a number of XenServer usecases. Part of that will > > require > > moving it in the migration stream. > > I found another migration problem in the test: > > If the migration fails, we will resume it in the source side. > > But the hvm guest doesn't response any more. > > > > In my test envirionment, the migration always successses, so I > > > > "succeeds" > > > > use a hack way to reproduce it: > > 1. modify the target xen tools: > > > > diff --git a/tools/libxl/libxl_stream_read.c > > b/tools/libxl/libxl_stream_read.c > > index 258dec4..da95606 100644 > > --- a/tools/libxl/libxl_stream_read.c > > +++ b/tools/libxl/libxl_stream_read.c > > @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc > > *egc, void > > *dcs_void, > > goto err; > > } > > + rc = ERROR_FAIL; > > + > > err: > > check_all_finished(egc, stream, rc); > > 2. xl cr hvm_nopv, and wait some time(You can login to the guest) > > 3. xl migrate hvm_nopv 192.168.3.1 > > > > The reason it that: > > We create a default ioreq server when we get the hvm param > > HVM_PARAM_IOREQ_PFN. > > It means that: the problem occurs only when the migration fails after we get > > the hvm param HVM_PARAM_IOREQ_PFN. > > > > In the function hvm_select_ioreq_server() > > If the I/O will be handed by non-default ioreq server, we will return the > > non-default ioreq server. In this case, it is handed by qemu. > > If the I/O will not be handed by non-default ioreq server, we will return > > the default ioreq server. Before migration, we return NULL, and after > > migration > > it is not NULL. > > See the caller is hvmemul_do_io(): > > case X86EMUL_UNHANDLEABLE: > > { > > struct hvm_ioreq_server *s = > > hvm_select_ioreq_server(curr->domain, &p); > > > > /* If there is no suitable backing DM, just ignore accesses */ > > if ( !s ) > > { > > rc = hvm_process_io_intercept(&null_handler, &p); > > vio->io_req.state = STATE_IOREQ_NONE; > > } > > else > > { > > rc = hvm_send_ioreq(s, &p, 0); > > if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down ) > > vio->io_req.state = STATE_IOREQ_NONE; > > else if ( data_is_addr ) > > rc = X86EMUL_OKAY; > > } > > break; > > > > We send the I/O request to the default I/O request server, but no backing > > DM hands it. We wil wait the I/O forever...... > > > > Hmm yes. This needs fixing. > > > > CC'ing Paul who did the ioreq server work. > > > > This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. The > > migration code needs a way of being able to query whether a default ioreq > > server exists, without creating one. > > > > Can you remember what the justification for the read side effects were? > > ISTR that it was only for qemu compatibility until the ioreq server work > > got in > > upstream. If that was the case, can we drop the read side effects now and m > > > > > > Can you give me a fix or a detailed suggestion for this bug? > > > > > > > > > Thanks > > > Zhang Chen > > > > > > > > default: > > > > > a.value = d->arch.hvm_domain.params[a.index]; > > > > > break; > > > > > -- > > > > > 2.7.4 > > > > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > -- > > > Thanks > > > zhangchen > > > > > > > > > > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@xxxxxxxxxxxxx > > > https://lists.xen.org/xen-devel > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxx > > https://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |