[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Regression: x86/mm: new _PTE_SWP_SOFT_DIRTY bit conflicts with existing use
On 08/22/2013 01:32 PM, David Vrabel wrote: > On 22/08/13 00:04, Linus Torvalds wrote: >> On Wed, Aug 21, 2013 at 12:03 PM, Cyrill Gorcunov <gorcunov@xxxxxxxxx> wrote: >>> >>> I personally don't see bug here because >>> >>> - this swapped page soft dirty bit is set for non-present entries only, >>> never for present ones, just at moment we form swap pte entry >>> >>> - i don't find any code which would test for this bit directly without >>> is_swap_pte call >> >> Ok, having gone through the places that use swp_*soft_dirty(), I have >> to agree. Afaik, it's only ever used on a swap-entry that has (by >> definition) the P bit clear. So with or without Xen, I don't see how >> it can make any difference. >> >> David/Konrad - did you actually see any issues, or was this just from >> (mis)reading the code? > > There are no Xen related bugs in the code, we were misreading it. > > It was my call to raise this as a regression without a repro and clearly > this was the wrong decision. > > However, having looked at the soft dirty implementation and specifically > the userspace ABI I think that it is far to closely coupled to the > current implementation. I think this will constrain future development > of the feature should userspace require a more efficient ABI than > scanning all of /proc/<pid>/pagemaps. > > Minimal downtime during 'live' checkpointing of a running task needs the > checkpointer to find and write out dirty pages faster than the task can > dirty them. Absolutely, but in "find and write" the "write" component is likely to take the majority of time -- we can scan PTEs of a mapping MUCH faster, than transmitting those over even 10Gbit link. We actually see this IRL -- in CRIU there's an atomic test, that checks mappings get dumped and restored properly. One of sub-tests is one 512Mb mapping. With it total dump time _minus_ memory dump time (which includes not only pagemap file scan, but also files, registers, process tree, sessions, etc.) is fractions of one second, while only the memory dump part's time is several seconds. That said, the super-fast API for getting "what has changed" is not as tempting to have as faster network/disk. What is _more_ time consuming in iterative migration in our case is the need to re-scan the whole /proc tree to get which processes had died and appeared, mess with /proc/pid/fd finding out what files were (re-)opened/closed/changed, talking to sock_diag subsystem for sockets information and alike. However, we haven't yet done careful analysis for what the slowest part is, but pagemap scans is definitely not. > This seems less likely to be possible if every iteration > all PTEs have to be scanned by the checkpointer instead of (e.g.,) > accessing a separate list of dirtied pages. But we don't scan all the x64 virtual address space's PTEs, instead we first analyze the /proc/pid/maps and scan only PTEs sitting in private mappings. > David > . > Thanks, Pavel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |