[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: oxenstored performance issue when starting VMs in parallel



> -----Original Message-----
> From: Jürgen Groß <jgross@xxxxxxxx>
> Sent: 22 September 2020 15:18
> To: paul@xxxxxxx; 'Edwin Torok' <edvin.torok@xxxxxxxxxx>; 
> sstabellini@xxxxxxxxxx; 'Anthony Perard'
> <anthony.perard@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: xen-users@xxxxxxxxxxxxxxxxxxxx; jerome.leseinne@xxxxxxxxx; julien@xxxxxxx
> Subject: Re: oxenstored performance issue when starting VMs in parallel
> 
> On 22.09.20 15:42, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Edwin Torok <edvin.torok@xxxxxxxxxx>
> >> Sent: 22 September 2020 14:29
> >> To: sstabellini@xxxxxxxxxx; Anthony Perard <anthony.perard@xxxxxxxxxx>; 
> >> xen-
> >> devel@xxxxxxxxxxxxxxxxxxxx; paul@xxxxxxx
> >> Cc: xen-users@xxxxxxxxxxxxxxxxxxxx; jerome.leseinne@xxxxxxxxx; 
> >> julien@xxxxxxx
> >> Subject: Re: oxenstored performance issue when starting VMs in parallel
> >>
> >> On Tue, 2020-09-22 at 15:17 +0200, jerome leseinne wrote:
> >>> Hi,
> >>>
> >>> Edwin you rock ! This call in qemu is effectively the culprit !
> >>> I have disabled this xen_bus_add_watch call and re-run test on our
> >>> big server:
> >>>
> >>> - oxenstored is now  between 10% to 20%  CPU usage (previously was
> >>> 100% all the time)
> >>> - All our VMs are responsive
> >>> - All our VM start in less than 10 seconds (before the fix some VMs
> >>> could take more than one minute to be fully up
> >>> - Dom0 is more responsive
> >>>
> >>> Disabling the watch may not be the ideal solution ( I let the qemu
> >>> experts answer this and the possible side effects),
> >>
> >> Hi,
> >>
> >> CC-ed Qemu maintainer of Xen code, please see this discussion about
> >> scalability issues with the backend watching code in qemu 4.1+.
> >>
> >> I think the scalability issue is due to this code in qemu, which causes
> >> an instance of qemu to see watches from all devices (even those
> >> belonging to other qemu instances), such that adding a single device
> >> causes N watches to be fired on each N instances of qemu:
> >>        xenbus->backend_watch =
> >>             xen_bus_add_watch(xenbus, "", /* domain root node */
> >>                               "backend", xen_bus_backend_changed,
> >>   &local_err);
> >>
> >> I can understand that for backwards compatibility you might need this
> >> code, but is there a way that an up-to-date (xl) toolstack could tell
> >> qemu what it needs to look at (e.g. via QMP, or other keys in xenstore)
> >> instead of relying on an overly broad watch?
> >
> > I think this could be made more efficient. The call to 
> > "module_call_init(MODULE_INIT_XEN_BACKEND)"
> just prior to this watch will register backends that do auto-creation so we 
> could register individual
> watches for the various backend types instead of this single one.
> 
> The watch should be on guest domain level, e.g. for:
> 
> /local/domain/0/backend/vbd/5
> 
> We have one qemu process per guest, after all.
> 

I'll see if I can spin a patch this afternoon.

  Paul

> 
> Juergen




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.