[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3.1 12/15] xen/x86: populate PVHv2 Dom0 physical memory map



On Fri, Nov 11, 2016 at 10:16:43AM -0700, Jan Beulich wrote:
> >>> On 29.10.16 at 10:59, <roger.pau@xxxxxxxxxx> wrote:
> > +static int __init hvm_populate_memory_range(struct domain *d, uint64_t 
> > start,
> > +                                             uint64_t size)
> > +{
> > +    unsigned int order, i = 0;
> > +    struct page_info *page;
> > +    int rc;
> > +#define MAP_MAX_ITER 64
> > +
> > +    ASSERT(IS_ALIGNED(size, PAGE_SIZE) && IS_ALIGNED(start, PAGE_SIZE));
> > +
> > +    order = MAX_ORDER;
> > +    while ( size != 0 )
> > +    {
> > +        order = min(get_order_from_bytes_floor(size), order);
> 
> This being the only caller, I don't see the point of the helper, the
> more that the logic to prevent underflow is unnecessary for the
> use here.

Right, the more that now get_order_from_bytes_floor is mostly a wrapper around 
get_order_from_bytes.
 
> > +        page = alloc_domheap_pages(d, order, memflags);
> > +        if ( page == NULL )
> > +        {
> > +            if ( order == 0 && memflags )
> > +            {
> > +                /* Try again without any memflags. */
> > +                memflags = 0;
> > +                order = MAX_ORDER;
> > +                continue;
> > +            }
> > +            if ( order == 0 )
> > +            {
> > +                printk("Unable to allocate memory with order 0!\n");
> > +                return -ENOMEM;
> > +            }
> > +            order--;
> > +            continue;
> > +        }
> > +
> > +        rc = guest_physmap_add_page(d, _gfn(PFN_DOWN(start)),
> > +                                    _mfn(page_to_mfn(page)), order);
> > +        if ( rc != 0 )
> > +        {
> > +            printk("Failed to populate memory: [%" PRIx64 ",%" PRIx64 ") 
> > %d\n",
> > +                   start, start + ((1ULL) << (order + PAGE_SHIFT)), rc);
> > +            return -ENOMEM;
> > +        }
> > +        start += 1ULL << (order + PAGE_SHIFT);
> > +        size -= 1ULL << (order + PAGE_SHIFT);
> 
> With all of these PAGE_SHIFT uses I wonder whether you wouldn't
> be better of doing everything with (number of) page frames.

Probably, this will avoid a couple of shifts here and there.

> > +static int __init hvm_steal_ram(struct domain *d, unsigned long size,
> > +                                paddr_t limit, paddr_t *addr)
> > +{
> > +    unsigned int i;
> > +
> > +    for ( i = 1; i <= d->arch.nr_e820; i++ )
> > +    {
> > +        struct e820entry *entry = &d->arch.e820[d->arch.nr_e820 - i];
> 
> Why don't you simply make the loop count downwards?

With i being an unsigned int, this would make the condition look quite awkward, 
because i >= 0 cannot be used. I would have to use i <= d->arch.nr_e820, so I 
think it's best to leave it as-is for readability.

> > +static int __init hvm_setup_p2m(struct domain *d)
> > +{
> > +    struct vcpu *saved_current, *v = d->vcpu[0];
> > +    unsigned long nr_pages;
> > +    int i, rc;
> 
> The use of i below calls for it to be unsigned int.

Sure.

> > +    bool preempted;
> > +
> > +    nr_pages = compute_dom0_nr_pages(d, NULL, 0);
> > +
> > +    hvm_setup_e820(d, nr_pages);
> > +    do {
> > +        preempted = false;
> > +        paging_set_allocation(d, dom0_paging_pages(d, nr_pages),
> > +                              &preempted);
> > +        process_pending_softirqs();
> > +    } while ( preempted );
> > +
> > +    /*
> > +     * Special treatment for memory < 1MB:
> > +     *  - Copy the data in e820 regions marked as RAM (BDA, BootSector...).
> > +     *  - Identity map everything else.
> > +     * NB: all this only makes sense if booted from legacy BIOSes.
> > +     * NB2: regions marked as RAM in the memory map are backed by RAM pages
> > +     * in the p2m, and the original data is copied over. This is done 
> > because
> > +     * at least FreeBSD places the AP boot trampoline in a RAM region found
> > +     * below the first MB, and the real-mode emulator found in Xen cannot
> > +     * deal with code that resides in guest pages marked as MMIO. This can
> > +     * cause problems if the memory map is not correct, and for example the
> > +     * EBDA or the video ROM region is marked as RAM.
> > +     */
> 
> Perhaps it's the real mode emulator which needs adjustment?

After the talk that we had on IRC, I've decided that the best way to deal with 
this is to map the RAM regions below 1MB as RAM instead of MMIO as it's done 
here, so I've added a helper to steal those pages from dom_io, assign them to 
Dom0, and map into the p2m as p2m_ram_rw. This works fine and the emulator no 
longer complains.

> > +    rc = modify_identity_mmio(d, 0, PFN_DOWN(MB(1)), true);
> > +    if ( rc )
> > +    {
> > +        printk("Failed to identity map low 1MB: %d\n", rc);
> > +        return rc;
> > +    }
> > +
> > +    /* Populate memory map. */
> > +    for ( i = 0; i < d->arch.nr_e820; i++ )
> > +    {
> > +        if ( d->arch.e820[i].type != E820_RAM )
> > +            continue;
> > +
> > +        rc = hvm_populate_memory_range(d, d->arch.e820[i].addr,
> > +                                       d->arch.e820[i].size);
> > +        if ( rc )
> > +            return rc;
> > +        if ( d->arch.e820[i].addr < MB(1) )
> > +        {
> > +            unsigned long end = min_t(unsigned long,
> > +                            d->arch.e820[i].addr + d->arch.e820[i].size, 
> > MB(1));
> > +
> > +            saved_current = current;
> > +            set_current(v);
> > +            rc = hvm_copy_to_guest_phys(d->arch.e820[i].addr,
> > +                                        
> > maddr_to_virt(d->arch.e820[i].addr),
> > +                                        end - d->arch.e820[i].addr);
> > +            set_current(saved_current);
> 
> If anything goes wrong here, how much confusion will result from
> current being wrong? In particular, will this complicate debugging
> of possible issues?

TBH, I'm not sure, current in this case is the idle domain, so trying to 
execute 
a hvm_copy_to_guest_phys with current being the idle domain, which from a Xen 
PoV is a PV vCPU, would probably result in some assert triggering in the 
hvm_copy_to_guest_phys path (or at least I would expect so). Note that this 
chunk of code is removed, since RAM regions below 1MB are now mapped as 
p2m_ram_rw.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.