[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns



On Mon, Sep 07, 2015 at 03:53:57AM -0600, Jan Beulich wrote:
> >>> On 07.09.15 at 11:36, <wei.liu2@xxxxxxxxxx> wrote:
> > On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
> >> >>> On 06.09.15 at 22:05, <wei.liu2@xxxxxxxxxx> wrote:
> >> > The original implementation of populate_pfns didn't consider the same
> >> > pfn can be present multiple times in the array. The mechanism to prevent
> >> > populating the same pfn multiple times only worked if the recurring pfn
> >> > appeared in different batches.
> >> > 
> >> > This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
> >> > which has several ptes pointing to same pfn, which results in an array
> >> > containing recurring pfn.
> >> 
> >> Since you must have debugged this, and since the bisector appears
> >> to have fingered a patch of mine on the Linux side which triggered
> >> this, would you mind explaining this a little more? In particular I'm
> >> worried that this may point out some other bug in Linux, as in the
> >> context of the change there - dealing with the 1:1 mapping - I can't
> >> see a legitimate reason for multiple PTEs to reference the same PFN.
> >> 
> > 
> > Sure. I can try to explain this as clear as possible. Note that I didn't
> > even look at Linux side changes because at that point I was sure there
> > was a bug in migration v2.
> > 
> > So there is a step called normalise_page in migration v2. It's nop for
> > HVM guest. For PV guest, it only cares about page table frames. To
> > normalise a page table frame, the core idea is to replace all MFNs in
> > page tables to PFNs inside the guest.
> > 
> > When restoring, there is a step called localise_page, which again is a
> > nop for HVM guest. For PV guest, it does the reverse of normalise_page.
> > It goes through all page table frames, extract all PFNs pointed to by
> > PTEs in such frames, populate them, then reconstruct page tables.
> > 
> > What I discovered is that PTEs inside one page table frame contained the
> > same PFN (something like fd42). The original implementation of toolstack
> > populate_pfns didn't consider such scenario. As for what that PFN
> > referred to, I wasn't sure and I didn't really care about that.
> 
> That's unfortunate, as that's precisely the information I was after,
> since - as said - taking the repetition of the same PFN together with
> what the triggering Linux change is about, it smells like there's
> something wrong on the Linux side too. Do you at least recall how
> many times that same PFN got repeated?
> 

Thousands of times.

Wei.

> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.