[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v20210701 15/40] tools: prepare to allocate saverestore arrays once

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
From: Olaf Hering <olaf@xxxxxxxxx>
Date: Mon, 5 Jul 2021 16:11:09 +0200
Authentication-results: strato.com; dkim=none
Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Juergen Gross <jgross@xxxxxxxx>
Delivery-date: Mon, 05 Jul 2021 14:11:44 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Am Mon, 5 Jul 2021 14:01:07 +0100
schrieb Andrew Cooper <andrew.cooper3@xxxxxxxxxx>:

> > The last one is always way faster because apparently map/unmap is less 
> > costly with a stopped guest.  
> That's suspicious.  If true, we've got some very wonky behaviour in the
> hypervisor...

At least the transfer rate this last iteration is consistent.
Since the only difference I can see is the fact that the domU is suspended, I 
suspect the mapping.
I did no investigation where the time is spent, I should probably do that one 
day to better understand this specific difference.

> > Right now the code may reach up to 15Gbit/s. The next step is to map the 
> > domU just once to reach wirespeed.  
> 
> We can in principle do that in 64bit toolstacks, for HVM guests.  But
> not usefully until we've fixed the fact that Xen has no idea what the
> guest physmap is supposed to look like.

Why would Xen care?
My attempt last year with a new save/restore code did just 'map' the memory on 
both sides. The 'unmap' was done in exit().

With this approach I got wirespeed in all iterations with a 10G link.

> At the moment, the current scheme is a little more resilient to bugs
> caused by the guest attempting to balloon during the live phase.

I did not specifically test how a domU behaves when it claims and releases 
pages while being migrated.
I think this series would handle at least parts of that:
If a page appears or disappears it will be recognized by getpageframeinfo.
If a page disappears between getpageframeinfo and MMAPBATCH I expect an error.
This error is fatal right now, perhaps the code could catch this and move on.
If a page disappears after MMAPBATCH it will be caught by later iterations.


> Another area to improve, which can be started now, is to avoid bounce
> buffering hypercall data.  Now that we have /dev/xen/hypercall which you
> can mmap() regular kernel pages from, what we want is a simple memory
> allocator which we can allocate permanent hypercall buffers from, rather
> than the internals of every xc_*() hypercall wrapper bouncing the data
> in (potentially) both directions.

That sounds like a good idea. Not sure how costly the current approach is.

> Oh - so the speedup might not be from reduced data handling?

At least not on the systems I have now.

Perhaps I should test how the numbers look like with the NIC and the toolstack 
in node#0, and the domU in node#1.


Olaf

Attachment: pgpUfwLARiJJM.pgp
Description: Digitale Signatur von OpenPGP

References:
- [PATCH v20210701 00/40] leftover from 2020
  - From: Olaf Hering
- [PATCH v20210701 15/40] tools: prepare to allocate saverestore arrays once
  - From: Olaf Hering
- Re: [PATCH v20210701 15/40] tools: prepare to allocate saverestore arrays once
  - From: Andrew Cooper
- Re: [PATCH v20210701 15/40] tools: prepare to allocate saverestore arrays once
  - From: Olaf Hering
- Re: [PATCH v20210701 15/40] tools: prepare to allocate saverestore arrays once
  - From: Andrew Cooper

Prev by Date: Re: [PATCH v20210701 12/40] tools: unify type checking for data pfns in migration stream
Next by Date: Re: [PATCH v7 9/9] docs/doxygen: doxygen documentation for grant_table.h
Previous by thread: Re: [PATCH v20210701 15/40] tools: prepare to allocate saverestore arrays once
Next by thread: Re: [PATCH v20210701 15/40] tools: prepare to allocate saverestore arrays once
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.