[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/3] Don't create default ioreq server



On Fri, Dec 09, 2016 at 04:43:58PM +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of
> > Konrad Rzeszutek Wilk
> > Sent: 09 December 2016 16:14
> > To: Zhang Chen <zhangchen.fnst@xxxxxxxxxxxxxx>; Paul Durrant
> > <Paul.Durrant@xxxxxxxxxx>
> > Cc: Changlong Xie <xiecl.fnst@xxxxxxxxxxxxxx>; Wei Liu
> > <wei.liu2@xxxxxxxxxx>; Eddie Dong <eddie.dong@xxxxxxxxx>; Andrew
> > Cooper <Andrew.Cooper3@xxxxxxxxxx>; Ian Jackson
> > <Ian.Jackson@xxxxxxxxxx>; Wen Congyang <wencongyang@xxxxxxxxx>; Paul
> > Durrant <Paul.Durrant@xxxxxxxxxx>; Yang Hongyang
> > <imhy.yang@xxxxxxxxx>; Xen devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
> > Subject: Re: [Xen-devel] [PATCH 1/3] Don't create default ioreq server
> > 
> > .snip..
> > > > If you can be more specific about what is broken in COLO we might be
> > > > able to devise a fix for you.
> > >
> > > My workmate have reported this BUG last year:
> > > https://lists.xenproject.org/archives/html/xen-devel/2015-
> > 12/msg02850.html
> > 
> > Paul, Andrew was asking about:
> > 
> >     This bug is caused by the read side effects of
> > HVM_PARAM_IOREQ_PFN. The migration code needs a way of being able to
> > query whether a default ioreq server exists, without creating one.
> > 
> >     Can you remember what the justification for the read side effects
> > were? ISTR that it was only for qemu compatibility until the ioreq server 
> > work
> > got in upstream. If that was the case, can we drop the read side effects now
> > and mandate that all qemus explicitly create their ioreq servers (even if 
> > this
> > involves creating a default ioreq server for qemu-trad)?
> > 
> 
> The read side effects are indeed because of the need to support the old qemu 
> interface. If trad were patched then we could at least deprecate the default 
> ioreq server but I'm not sure how long we'd need to leave it in place after 
> that before it was removed. Perhaps it ought to be under a KCONFIG option, 
> since it's also a bit of a security hole.
> 

So.. what can be done about to make COLO work?

>   Paul
> 
> > 
> > ?
> > 
> > Full thread below:
> > 
> > [Top] [All Lists]
> > [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread
> > Index]
> > Re: [Xen-devel] question about migration
> > 
> > To: Wen Congyang <wency@xxxxxxxxxxxxxx>
> > From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > Date: Tue, 29 Dec 2015 11:24:14 +0000
> > Cc: Paul Durrant <paul.durrant@xxxxxxxxxx>, xen devel <xen-
> > devel@xxxxxxxxxxxxx>
> > Delivery-date: Tue, 29 Dec 2015 11:24:33 +0000
> > List-id: Xen developer discussion <xen-devel.lists.xen.org>
> > On 25/12/2015 01:45, Wen Congyang wrote:
> > On 12/24/2015 08:36 PM, Andrew Cooper wrote:
> > On 24/12/15 02:29, Wen Congyang wrote:
> > Hi Andrew Cooper:
> > 
> > I rebase the COLO codes to the newest upstream xen, and test it. I found
> > a problem in the test, and I can reproduce this problem via the migration.
> > 
> > How to reproduce:
> > 1. xl cr -p hvm_nopv
> > 2. xl migrate hvm_nopv 192.168.3.1
> > You are the very first person to try a usecase like this.
> > 
> > It works as much as it does because of your changes to the uncooperative
> > HVM
> > domain logic.  I have said repeatedly during review, this is not 
> > necessarily a
> > safe change to make without an in-depth analysis of the knock-on effects; it
> > looks as if you have found the first knock-on effect.
> > 
> > The migration successes, but the vm doesn't run in the target machine.
> > You can get the reason from 'xl dmesg':
> > (XEN) HVM2 restore: VMCE_VCPU 1
> > (XEN) HVM2 restore: TSC_ADJUST 0
> > (XEN) HVM2 restore: TSC_ADJUST 1
> > (d2) HVM Loader
> > (d2) Detected Xen v4.7-unstable
> > (d2) Get guest memory maps[128] failed. (-38)
> > (d2) *** HVMLoader bug at e820.c:39
> > (d2) *** HVMLoader crashed.
> > 
> > The reason is that:
> > We don't call xc_domain_set_memory_map() in the target machine.
> > When we create a hvm domain:
> > libxl__domain_build()
> >       libxl__build_hvm()
> >           libxl__arch_domain_construct_memmap()
> >               xc_domain_set_memory_map()
> > 
> > Should we migrate the guest memory from source machine to target
> > machine?
> > This bug specifically is because HVMLoader is expected to have run and
> > turned
> > the hypercall information in an E820 table in the guest before a migration
> > occurs.
> > 
> > Unfortunately, the current codebase is riddled with such assumption and
> > expectations (e.g. the HVM save code assumed that FPU context is valid
> > when it
> > is saving register state) which is a direct side effect of how it was 
> > developed.
> > 
> > 
> > Having said all of the above, I agree that your example is a usecase which
> > should work.  It is the ultimate test of whether the migration stream 
> > contains
> > enough information to faithfully reproduce the domain on the far side.
> > Clearly
> > at the moment, this is not the case.
> > 
> > I have an upcoming project to work on the domain memory layout logic,
> > because
> > it is unsuitable for a number of XenServer usecases. Part of that will 
> > require
> > moving it in the migration stream.
> > I found another migration problem in the test:
> > If the migration fails, we will resume it in the source side.
> > But the hvm guest doesn't response any more.
> > 
> > In my test envirionment, the migration always successses, so I
> > 
> > "succeeds"
> > 
> > use a hack way to reproduce it:
> > 1. modify the target xen tools:
> > 
> > diff --git a/tools/libxl/libxl_stream_read.c 
> > b/tools/libxl/libxl_stream_read.c
> > index 258dec4..da95606 100644
> > --- a/tools/libxl/libxl_stream_read.c
> > +++ b/tools/libxl/libxl_stream_read.c
> > @@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc
> > *egc, void
> > *dcs_void,
> >           goto err;
> >       }
> > + rc = ERROR_FAIL;
> > +
> >    err:
> >       check_all_finished(egc, stream, rc);
> > 2. xl cr hvm_nopv, and wait some time(You can login to the guest)
> > 3. xl migrate hvm_nopv 192.168.3.1
> > 
> > The reason it that:
> > We create a default ioreq server when we get the hvm param
> > HVM_PARAM_IOREQ_PFN.
> > It means that: the problem occurs only when the migration fails after we get
> > the hvm param HVM_PARAM_IOREQ_PFN.
> > 
> > In the function hvm_select_ioreq_server()
> > If the I/O will be handed by non-default ioreq server, we will return the
> > non-default ioreq server. In this case, it is handed by qemu.
> > If the I/O will not be handed by non-default ioreq server, we will return
> > the default ioreq server. Before migration, we return NULL, and after
> > migration
> > it is not NULL.
> > See the caller is hvmemul_do_io():
> >      case X86EMUL_UNHANDLEABLE:
> >      {
> >          struct hvm_ioreq_server *s =
> >              hvm_select_ioreq_server(curr->domain, &p);
> > 
> >          /* If there is no suitable backing DM, just ignore accesses */
> >          if ( !s )
> >          {
> >              rc = hvm_process_io_intercept(&null_handler, &p);
> >              vio->io_req.state = STATE_IOREQ_NONE;
> >          }
> >          else
> >          {
> >              rc = hvm_send_ioreq(s, &p, 0);
> >              if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
> >                  vio->io_req.state = STATE_IOREQ_NONE;
> >              else if ( data_is_addr )
> >                  rc = X86EMUL_OKAY;
> >          }
> >          break;
> > 
> > We send the I/O request to the default I/O request server, but no backing
> > DM hands it. We wil wait the I/O forever......
> > 
> > Hmm yes.  This needs fixing.
> > 
> > CC'ing Paul who did the ioreq server work.
> > 
> > This bug is caused by the read side effects of HVM_PARAM_IOREQ_PFN. The
> > migration code needs a way of being able to query whether a default ioreq
> > server exists, without creating one.
> > 
> > Can you remember what the justification for the read side effects were?
> > ISTR that it was only for qemu compatibility until the ioreq server work 
> > got in
> > upstream. If that was the case, can we drop the read side effects now and m
> > >
> > > Can you give me a fix or a detailed suggestion for this bug?
> > >
> > >
> > > Thanks
> > > Zhang Chen
> > >
> > > > >       default:
> > > > >           a.value = d->arch.hvm_domain.params[a.index];
> > > > >           break;
> > > > > --
> > > > > 2.7.4
> > > > >
> > > > >
> > > > >
> > > >
> > > > .
> > > >
> > >
> > > --
> > > Thanks
> > > zhangchen
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@xxxxxxxxxxxxx
> > > https://lists.xen.org/xen-devel
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxx
> > https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.