[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns
On Mon, Sep 07, 2015 at 03:53:57AM -0600, Jan Beulich wrote: > >>> On 07.09.15 at 11:36, <wei.liu2@xxxxxxxxxx> wrote: > > On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote: > >> >>> On 06.09.15 at 22:05, <wei.liu2@xxxxxxxxxx> wrote: > >> > The original implementation of populate_pfns didn't consider the same > >> > pfn can be present multiple times in the array. The mechanism to prevent > >> > populating the same pfn multiple times only worked if the recurring pfn > >> > appeared in different batches. > >> > > >> > This bug is discovered by Linux 4.1 32 bit kernel save / restore test, > >> > which has several ptes pointing to same pfn, which results in an array > >> > containing recurring pfn. > >> > >> Since you must have debugged this, and since the bisector appears > >> to have fingered a patch of mine on the Linux side which triggered > >> this, would you mind explaining this a little more? In particular I'm > >> worried that this may point out some other bug in Linux, as in the > >> context of the change there - dealing with the 1:1 mapping - I can't > >> see a legitimate reason for multiple PTEs to reference the same PFN. > >> > > > > Sure. I can try to explain this as clear as possible. Note that I didn't > > even look at Linux side changes because at that point I was sure there > > was a bug in migration v2. > > > > So there is a step called normalise_page in migration v2. It's nop for > > HVM guest. For PV guest, it only cares about page table frames. To > > normalise a page table frame, the core idea is to replace all MFNs in > > page tables to PFNs inside the guest. > > > > When restoring, there is a step called localise_page, which again is a > > nop for HVM guest. For PV guest, it does the reverse of normalise_page. > > It goes through all page table frames, extract all PFNs pointed to by > > PTEs in such frames, populate them, then reconstruct page tables. > > > > What I discovered is that PTEs inside one page table frame contained the > > same PFN (something like fd42). The original implementation of toolstack > > populate_pfns didn't consider such scenario. As for what that PFN > > referred to, I wasn't sure and I didn't really care about that. > > That's unfortunate, as that's precisely the information I was after, > since - as said - taking the repetition of the same PFN together with > what the triggering Linux change is about, it smells like there's > something wrong on the Linux side too. Do you at least recall how > many times that same PFN got repeated? > Thousands of times. Wei. > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |