[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/3] xen: implement guest_physmap_(un)pin_range



On Thu, 8 Aug 2013, Ian Campbell wrote:
> > +int guest_physmap_pin_range(struct domain *d,
> > +                            xen_pfn_t gpfn,
> > +                            unsigned int order)
> > +{
> > +    lpae_t *first = NULL, *second = NULL, *third = NULL, pte;
> > +    paddr_t addr = gpfn << PAGE_SHIFT;
> > +    struct p2m_domain *p2m = &d->arch.p2m;
> > +    int rc = -EFAULT, i;
> > +    unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
> > +
> > +    spin_lock(&p2m->lock);
> > +
> > +    first = __map_domain_page(p2m->first_level);
> > +
> > +    if ( !first ||
> > +            !first[first_table_offset(addr)].p2m.valid ||
> > +            !first[first_table_offset(addr)].p2m.table )
> > +        goto err;
> > +
> > +    for ( i = 0; i < (1UL << order); i++ )
> > +    {
> > +        if ( cur_first_offset != first_table_offset(addr) )
> > +        {
> > +            if (second) unmap_domain_page(second);
> > +            second = 
> > map_domain_page(first[first_table_offset(addr)].p2m.base);
> > +            cur_first_offset = first_table_offset(addr);
> > +        }
> > +        if ( !second ||
> > +                !second[second_table_offset(addr)].p2m.valid ||
> > +                !second[second_table_offset(addr)].p2m.table )
> > +            goto err;
> > +
> > +        if ( cur_second_offset != second_table_offset(addr) )
> > +        {
> > +            if (third) unmap_domain_page(third);
> > +            third = 
> > map_domain_page(second[second_table_offset(addr)].p2m.base);
> > +            cur_second_offset = second_table_offset(addr);
> > +        }
> > +        if ( !third ||
> > +                !third[third_table_offset(addr)].p2m.valid ||
> > +                third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN )
> 
> At this point rc == -EFAULT, which doesn't seem right when failing the
> P2M_DMA_PIN check.

I'll go with -EINVAL. Do you have any better suggestions otherwise?


> > +            goto err;
> > +
> > +        pte = third[third_table_offset(addr)];
> > +        pte.p2m.avail |= P2M_DMA_PIN;
> > +        write_pte(&third[third_table_offset(addr)], pte);
> > +
> > +        addr += PAGE_SIZE;
> > +    }
> > +
> > +    rc = 0;
> > +
> > +err:
> 
> Leaks final mapping of third I think?
> 
> You add two p2m walkers in this patch, and we have at least one already.
> Perhaps it is time for a generic function which calls a callback on the
> leaf ptes?

OK, good idea.


> > +    if ( second ) unmap_domain_page(second);
> > +    if ( first ) unmap_domain_page(first);
> > +
> > +    spin_unlock(&p2m->lock);
> > +
> > +    return rc;
> > +}
> > +
> > +int guest_physmap_unpin_range(struct domain *d,
> > +                              xen_pfn_t gpfn,
> > +                              unsigned int order)
> > +{
> > +    lpae_t *first = NULL, *second = NULL, *third = NULL, pte;
> > +    paddr_t addr = gpfn << PAGE_SHIFT;
> > +    struct p2m_domain *p2m = &d->arch.p2m;
> > +    int rc = -EFAULT, i;
> > +    unsigned long cur_first_offset = ~0, cur_second_offset = ~0;
> > [...]
> 
> Same generic comments as the pin version.
> 
> > +    rc = 0;
> > +
> > +err:
> > +    if ( second ) unmap_domain_page(second);
> > +    if ( first ) unmap_domain_page(first);
> > +
> > +    spin_unlock(&p2m->lock);
> > +
> > +    return rc;
> > +}
> > +
> >  int guest_physmap_mark_populate_on_demand(struct domain *d,
> >                                            unsigned long gfn,
> >                                            unsigned int order)
> > @@ -184,6 +307,14 @@ static int create_p2m_entries(struct domain *d,
> >              cur_second_offset = second_table_offset(addr);
> >          }
> >  
> > +        if ( third[third_table_offset(addr)].p2m.avail & P2M_DMA_PIN )
> > +        {
> > +            rc = -EINVAL;
> > +            printk("%s: cannot change p2m mapping for paddr=%"PRIpaddr
> > +                    " domid=%d, the page is pinned\n", __func__, addr, 
> > d->domain_id);
> 
> Not sure if a guest can trigger this, if yes then gdprintk.
> 
> Do we always clear this bit when we clear the present bit? If not then
> we could end up with entries which are pinned and not-present. We should
> probably avoid that one way or another!

I went through the code: yes we do, because we always reset the entire
pte entry rather than only the valid bit.


> > +            goto out;
> > +        }
> > +
> >          flush = third[third_table_offset(addr)].p2m.valid;
> >  
> >          /* Allocate a new RAM page and attach */
> > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> > index 5e7c5a3..8db8816 100644
> > --- a/xen/include/asm-arm/mm.h
> > +++ b/xen/include/asm-arm/mm.h
> > @@ -319,6 +319,10 @@ void free_init_memory(void);
> >  
> >  int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long 
> > gfn,
> >                                            unsigned int order);
> > +int guest_physmap_pin_range(struct domain *d, xen_pfn_t gpfn,
> > +                            unsigned int order);
> > +int guest_physmap_unpin_range(struct domain *d, xen_pfn_t gpfn,
> > +                              unsigned int order);
> >  
> >  extern void put_page_type(struct page_info *page);
> >  static inline void put_page_and_type(struct page_info *page)
> > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> > index 41e9eff..12087d0 100644
> > --- a/xen/include/asm-arm/page.h
> > +++ b/xen/include/asm-arm/page.h
> > @@ -153,11 +153,15 @@ typedef struct {
> >      unsigned long hint:1;       /* In a block of 16 contiguous entries */
> >      unsigned long sbz2:1;
> >      unsigned long xn:1;         /* eXecute-Never */
> > -    unsigned long avail:4;      /* Ignored by hardware */
> > +    unsigned long avail:4;      /* Ignored by hardware, see below */
> >  
> >      unsigned long sbz1:5;
> >  } __attribute__((__packed__)) lpae_p2m_t;
> >  
> > +/* Xen "avail" bits allocation in third level entries */
> > +#define P2M_DMA_PIN     (1<<0)  /* The page has been "pinned": the 
> > hypervisor
> > +                                   promises not to change the p2m mapping. 
> > */
> 
> I guess we are going to want to keep type info here eventually, this
> leaves 3 bits for that.
> 
> For reference x86 has the following p2m types :
>     p2m_ram_rw = 0,             /* Normal read/write guest RAM */
>     p2m_invalid = 1,            /* Nothing mapped here */
>     p2m_ram_logdirty = 2,       /* Temporarily read-only for log-dirty */
>     p2m_ram_ro = 3,             /* Read-only; writes are silently dropped */
>     p2m_mmio_dm = 4,            /* Reads and write go to the device model */
>     p2m_mmio_direct = 5,        /* Read/write mapping of genuine MMIO area */
>     p2m_populate_on_demand = 6, /* Place-holder for empty memory */
> 
>     /* Although these are defined in all builds, they can only
>      * be used in 64-bit builds */
>     p2m_grant_map_rw = 7,         /* Read/write grant mapping */
>     p2m_grant_map_ro = 8,         /* Read-only grant mapping */
>     p2m_ram_paging_out = 9,       /* Memory that is being paged out */
>     p2m_ram_paged = 10,           /* Memory that has been paged out */
>     p2m_ram_paging_in = 11,       /* Memory that is being paged in */
>     p2m_ram_shared = 12,          /* Shared or sharable memory */
>     p2m_ram_broken = 13,          /* Broken page, access cause domain crash */
> 
> We can get away with not having some of these for now, but not
> necessarily in the long term I think. 4 available bits is really too
> few :-(
> 
> I'm unsure if pinning is orthogonal to type. I think it probably is? Or
> can we say that only normal ram can be pinned? Actually that may make
> sense since many of the other types are either implicitly pinned already
> (grant map, mmio etc) or pinning makes no sense (broken) or pinning
> would have to change the type anyway (pin shared -> unshare -> ram_rw
> +pinned, likewise for the paging types).
> 
> So the pinable types are p2m_ram_rw, p2m_ram_ro, p2m_ram_logdirty? We
> could perhaps declare that ro and logdirty are implicitly pinned too?
> For log dirty it's pretty obvious, not so sure about ro -- doing DMA
> even just from r/o memory seems a bit dodgy to allow.

I agree. I think that only p2m_ram_rw should be pin-able.
I can add that to the comment.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.