[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS



On Wed, 15 Aug 2012, David Vrabel wrote:
> On 15/08/12 14:55, Stefano Stabellini wrote:
> > On Wed, 15 Aug 2012, David Vrabel wrote:
> >> On 14/08/12 15:12, Attilio Rao wrote:
> >>> On 14/08/12 14:57, David Vrabel wrote:
> >>>> On 14/08/12 13:24, Attilio Rao wrote:
> >> After looking at it some more, I think this pv-ops is unnecessary. How
> >> about the following patch to just remove it completely?
> >>
> >> I've only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning
> >> is sound.
> > 
> > Do you have more then 4G to dom0 on those boxes?
> 
> I've tested with 6G now, both 64-bit and 32-bit with HIGHPTE.
> 
> > It certainly fixed a serious crash at the time it was introduced, see
> > http://marc.info/?l=linux-kernel&m=129901609503574 and
> > http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big
> > changed in kernel_physical_mapping_init, I think we still need it.
> > Depending on the e820 of your test box, the kernel could crash (or not),
> > possibly in different places.
> >
> >>>> Having said that, I couldn't immediately see where pages in (end, 
> >>>> pgt_buf_top] was getting set RO.  Can you point me to where it's 
> >>>> done?
> >>>>
> >>>
> >>> As mentioned in the comment, please look at xen_set_pte_init().
> >>
> >> xen_set_pte_init() only ensures it doesn't set the PTE as writable if it
> >> is already present and read-only.
> > 
> > look at mask_rw_pte and read the threads linked above, unfortunately it
> > is not that simple.
> 
> Yes, I was remembering what 32-bit did here.
> 
> The 64-bit version is a bit confused and it often ends up /not/ clearing
> RW for the direct mapping of the pages in the pgt_buf because any
> existing RW mappings will be used as-is.  See phys_pte_init() which
> checks for an existing mapping and only sets the PTE if it is not
> already set.

not all the pagetable pages might be already mapped, even if they are
already hooked into the pagetable


> pgd_populate(), pud_populate(), and pmd_populate() take care of clearing
> RW for the newly allocated page table pages, so mask_rw_pte() only needs
> to consider clearing RW for mappings of the the existing page table PFNs
> which all lie outside the range (pt_buf_start, pt_buf_top].
> 
> This patch makes it more sensible, and makes removal of the pv-op safe
> if pgt_buf lies outside the initial mapping.
> 
> index 04c1f61..2fd5e86 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1400,14 +1400,13 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
> pte_t pte)
>       unsigned long pfn = pte_pfn(pte);
> 
>       /*
> -      * If the new pfn is within the range of the newly allocated
> -      * kernel pagetable, and it isn't being mapped into an
> -      * early_ioremap fixmap slot as a freshly allocated page, make sure
> -      * it is RO.
> +      * If this is a PTE of an early_ioremap fixmap slot but
> +      * outside the range (pgt_buf_start, pgt_buf_top], then this
> +      * PTE is mapping a PFN in the current page table.  Make
> +      * sure it is RO.
>        */
> -     if (((!is_early_ioremap_ptep(ptep) &&
> -                     pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
> -                     (is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 
> 1)))
> +     if (is_early_ioremap_ptep(ptep)
> +         && (pfn < pgt_buf_start || pfn >= pgt_buf_top))
>               pte = pte_wrprotect(pte);
> 
>       return pte;
> 

That's a mistake, you just inverted the condition!
What if map_low_page is used to map a page already hooked into the
pagetable? It should be RO while with your change it would be RW.
Also you don't handle the case when map_low_page is used to map the very
latest page, allocated and mapped by alloc_low_page, but
still not hooked into the pagetable: that page needs to be RW.

I believe you need more RAM than 6G to see all these issues.


> >> 8<----------------------
> >> x86: remove x86_init.mapping.pagetable_reserve paravirt op
> >>
> >> The x86_init.mapping.pagetable_reserve paravirt op is used for Xen
> >> guests to set the writable flag for the mapping of (pgt_buf_end,
> >> pgt_buf_top].  This is not necessary as these pages are never set as
> >> read-only as they have never contained page tables.
> > 
> > Is this actually true? It is possible when pagetable pages are
> > allocated by alloc_low_page.
> 
> These newly allocated page table pages will be set read-only when they
> are linked into the page tables with pgd_populate(), pud_populate() and
> friends.
> 
> >> When running as a Xen guest, the initial page tables are provided by
> >> Xen (these are reserved with memblock_reserve() in
> >> xen_setup_kernel_pagetable()) and constructed in brk space (for 32-bit
> >> guests) or in the kernel's .data section (for 64-bit guests, see
> >> head_64.S).
> >>
> >> Since these are all marked as reserved, (pgt_buf_start, pgt_buf_top]
> >> does not overlap with them and the mappings for these PFNs will be
> >> read-write.
> > 
> > We are talking about pagetable pages built by
> > kernel_physical_mapping_init.
> > 
> > 
> >> Since Xen doesn't need to change the mapping its implementation
> >> becomes the same as a native and we can simply remove this pv-op
> >> completely.
> > 
> > I don't think so.
> 
> David
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.