[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns



On 09/07/2015 11:53 AM, Jan Beulich wrote:
On 07.09.15 at 11:36, <wei.liu2@xxxxxxxxxx> wrote:
On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
On 06.09.15 at 22:05, <wei.liu2@xxxxxxxxxx> wrote:
The original implementation of populate_pfns didn't consider the same
pfn can be present multiple times in the array. The mechanism to prevent
populating the same pfn multiple times only worked if the recurring pfn
appeared in different batches.

This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
which has several ptes pointing to same pfn, which results in an array
containing recurring pfn.

Since you must have debugged this, and since the bisector appears
to have fingered a patch of mine on the Linux side which triggered
this, would you mind explaining this a little more? In particular I'm
worried that this may point out some other bug in Linux, as in the
context of the change there - dealing with the 1:1 mapping - I can't
see a legitimate reason for multiple PTEs to reference the same PFN.


Sure. I can try to explain this as clear as possible. Note that I didn't
even look at Linux side changes because at that point I was sure there
was a bug in migration v2.

So there is a step called normalise_page in migration v2. It's nop for
HVM guest. For PV guest, it only cares about page table frames. To
normalise a page table frame, the core idea is to replace all MFNs in
page tables to PFNs inside the guest.

When restoring, there is a step called localise_page, which again is a
nop for HVM guest. For PV guest, it does the reverse of normalise_page.
It goes through all page table frames, extract all PFNs pointed to by
PTEs in such frames, populate them, then reconstruct page tables.

What I discovered is that PTEs inside one page table frame contained the
same PFN (something like fd42). The original implementation of toolstack
populate_pfns didn't consider such scenario. As for what that PFN
referred to, I wasn't sure and I didn't really care about that.

That's unfortunate, as that's precisely the information I was after,
since - as said - taking the repetition of the same PFN together with
what the triggering Linux change is about, it smells like there's
something wrong on the Linux side too. Do you at least recall how
many times that same PFN got repeated?

The linear p2m list support introduced this behaviour. Instead of having
multiple copies of identical p2m pages (e.g. for all entries of the page
being ~0UL) only one such page is existing which is mapped multiple
times in the linear p2m list. This will happen for large regions (2 MB
aligned) of either identity mapped or invalid pfns.

In domUs we see such a scenario rather rarely as it would require either
large memory holes or large identity regions. You might have introduced
the latter.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.