[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen crash with mem-sharing and cloning





On Tue, Mar 24, 2015 at 4:54 AM, Andres Lagar Cavilla <andres@xxxxxxxxxxxxxxxx> wrote:


On Mon, Mar 23, 2015 at 11:25 AM, Tamas K Lengyel <tklengyel@xxxxxxxxxxxxx> wrote:
On Mon, Mar 23, 2015 at 6:59 PM, Andres Lagar Cavilla <andres@xxxxxxxxxxxxxxxx> wrote:
On Mon, Mar 23, 2015 at 9:10 AM, Tamas K Lengyel <tklengyel@xxxxxxxxxxxxx> wrote:
Hello everyone,
I'm trying to chase down a bug that reproducibly crashes Xen (tested with 4.4.1). The problem is somewhere within the mem-sharing subsystem and how that interacts with domains that are being actively saved. In my setup I use the xl toolstack to rapidly create clones of HVM domains by piping "xl save -c" into xl restore with a modified domain config which updates the name/disk/vif. However, during such an operation Xen crashes with the following log if there are already active clones.

IMHO there should be no conflict between saving the domain and memsharing, as long as the domain is actually just being checkpointed "-c" - it's memory should remain as is. This is however clearly not the case. Any ideas?

Tamas, I'm not clear on the use of memsharing in this workflow. As described, you pipe save into restore, but the internal magic is lost on me. Are you fanning out to multiple restores? That would seem to be the case, given the need to update name/disk/vif.

Anyway, I'm inferring. Instead, could you elaborate?

Thanks
Andre

Hi Andre,
thanks for getting back on this issue. The script I'm using is at https://github.com/tklengyel/drakvuf/blob/master/tools/clone.pl. The script simply creates a FIFO pipe (mkfifo) and saves the domain into that pipe which is immediately read by xl restore with the updated configuration file. This mainly just to eliminate having to read the memory dump from disk. That part of the system works as expected and multiple save/restores running at the same time don't cause any side-effects. Once the domain has thus been cloned, I run memshare on every page which also works as expected. This problem only occurs when the cloning procedure runs when a page unshare operation kicks in on a already active clone (as you see in the log).

Sorry Tamas, I'm a bit slow here, I looked at your script -- looks allright, no mention of memsharing in there.

Re-reading ... memsharing? memshare? Is this memshrtool in tools/testing? How are you running it?


Hi Andre,
the memsharing happens here https://github.com/tklengyel/drakvuf/blob/master/src/main.c#L144 after the clone script finished. This is effectively the same approach as in tools/testing, just automatically looping from 0 to max_gpfn. Afterwards all unsharing happens automatically either induced by the guest itself, or when I map pages into the my app with xc_map_foreign_range PROT_WRITE.
Â

Certainly no xen crash should happen with user-space input. I'm just trying to understand what you're doing. The unshare code is not, uhmm, brief, so a NULL deref could happen in half a dozen places at first glance.

Well let me know what I could do help tracing it down. I don't think (potentially buggy) userspace tools should crash Xen either =)

Tamas
Â

Thanks
Andres


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.