[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Provide support for multiple frame buffers in Xen.



Hi, 

At 14:15 -0400 on 16 Oct (1350396902), Robert Phillips wrote:
> From: Robert Phillips <robert.phillips@xxxxxxxxxxxxxxxxxxx>
> 
> Support is provided for both shadow and hardware assisted paging (HAP) modes.
> This code bookkeeps the set of video frame buffers (vram),
> detects when the guest has modified any of those buffers and, upon request,
> returns a bitmap of the modified pages.
> 
> This lets other software components re-paint the portions of the monitor (or 
> monitors) that have changed.
> Each monitor has a frame buffer of some size at some position in guest 
> physical memory.
> The set of frame buffers being tracked can change over time as monitors are 
> plugged and unplugged.

Having read through this in detail, it's looking very plausible. :)
A few style nits:
 - please use the Xen spacings around 'if ( foo )' and 'for ( x; y; z )';
 - there's a bit of trailing whitespace in the new file, and a few
   places where indentation seems to have gone a bit wrong;
 - please make sure the whole thing is linewrapped to <80 characters; and
 - there's no need for braces around single-line blocks.

More substantive comments:
 - I think the dirty_vram.c and dirty_vram.h files belong under mm/
   rather than under hvm/.  The ``#include "../mm/mm-locks.h"'' is 
   an indicator that this is really MM code. 
 - Please use xzalloc() rather than xmalloc() + memset(0).  It avoids
   the sizes of alloc and memset getting out of sync.
 - The i386 build is dead, so you can drop some #ifdef __i386__ sections.
 - There really ought to be some limit on how many PTEs you're willing
   to track.  Otherwise a large guest can consume lots and lots of Xen's
   memory by making lots of PTEs that point to framebuffers.  That
   might also lead to performance problems, e.g. in the unshadow
   function that walks over all those liked lists. 
   Also, I think that the memory for the paddr_links ought to come from
   the shadow pool (i.e. using domain->arch.paging.alloc_page())
   rather than soaking up otherwise free memory.

A few other detailed comments below...

> +/* Free a paddr_link struct, given address of its predecessor in linked list 
> */
> +dv_paddr_link_t *
> +free_paddr_link(struct domain *d,
> +                dv_paddr_link_t **ppl,
> +                dv_paddr_link_t *pl)
> +{
> +    dv_dirty_vram_t *dirty_vram = d->arch.hvm_domain.dirty_vram;
> +    dv_paddr_link_t *npl; /* next pl */
> +
> +    ASSERT( paging_locked_by_me(d) );
> +    /* extension mapping? */
> +    if (ppl) /* yes. free it */
> +    {
> +        pl = (*ppl);

This assignment seems like it should always be a noop.  Would it be
correct to replace it with ASSERT(pl == *ppl)?

> +        (*ppl) = npl = pl->pl_next;
> +    }
> +    else  /* main table */
> +    {
> +        /* move 2nd mapping to main table.
> +         * and free 2nd mapping */
> +        dv_paddr_link_t * spl;
> +        spl = pl->pl_next;
> +        if (spl == NULL)
> +        {
> +            pl->sl1ma = INVALID_PADDR;
> +            return pl;
> +        }
> +        pl->sl1ma = spl->sl1ma;
> +        pl->pl_next = spl->pl_next;
> +        npl = pl; /* reprocess main table entry again */
> +        pl = spl;

OK, that took a lot of staring at to be sure it's right. :)  I'd be
inclined to just put all paddr_links in the linked list (and have an
array of pointers rather than an array of paddr_link_ts).  Is it worth
having the extra complexity here, and at the callers, to avoid a single
memory read?

> +    }
> +    pl->sl1ma = INVALID_PADDR;
> +    pl->pl_next = dirty_vram->pl_free;
> +    dirty_vram->pl_free = pl;
> +    return npl;
> +}
> +
> +
> +/* dirty_vram_range_update()
> + * This is called whenever a level 1 page table entry is modified.
> + * If the L1PTE is being cleared, the function removes any paddr_links
> + * that refer to it.
> + * If the L1PTE is being set to a frame buffer page, a paddr_link is
> + * created for that page's entry in pl_tab.
> + * Returns 1 iff entry found and set or cleared.
> + */
> +int dirty_vram_range_update(struct domain *d,
> +                            unsigned long gfn,
> +                            paddr_t sl1ma,
> +                            int set)
> +{
> +    int effective = 0;
> +    dv_range_t *range;
> +
> +    ASSERT(paging_locked_by_me(d));
> +    range = dirty_vram_range_find_gfn(d, gfn);
> +    if ( range )
> +    {

I think this would be more readable as 'if ( !range ) return 0' here 
rather than indenting most of the function. 

> +        unsigned long i = gfn - range->begin_pfn;
> +        dv_paddr_link_t *pl = &range->pl_tab[ i ];
> +        dv_paddr_link_t **ppl = NULL;
> +        int len = 0;
> +
> +        /* find matching entry (pl), if any, and its predecessor
> +         * in linked list (ppl) */
> +        while (pl != NULL)
> +        {
> +            if (pl->sl1ma == sl1ma || pl->sl1ma == INVALID_PADDR )
> +                break;
> +            ppl = &pl->pl_next;
> +            pl = *ppl;
> +            len++;
> +        }
> +            
> +        if (set)
> +        {
> +            /* Did we find sl1ma in either the main table or the linked 
> list? */
> +            if (pl == NULL) /* no, so we'll need to alloc a link */
> +            {
> +                ASSERT(ppl != NULL);
> +                /* alloc link and append it to list */
> +                (*ppl) = pl = alloc_paddr_link(d);
> +                if (pl == NULL)
> +                    goto out;

This needs to signal some sort of error.  Otherwise, if we can't add
this sl1e to the list we'll just silently fail to track it.

> +            }
> +            if ( pl->sl1ma != sl1ma )
> +            {

ASSERT(pl->sl1ma == INVALID_PADDR) ? 

> +                pl->sl1ma = sl1ma;
> +                range->nr_mappings++;
> +            }
> +            effective = 1;
> +            if (len > range->mappings_hwm)
> +            {
> +                range->mappings_hwm = len;
> +#if DEBUG_update_vram_mapping
> +                gdprintk(XENLOG_DEBUG,
> +                         "[%lx] set      sl1ma:%lx hwm:%d mappings:%d 
> freepages:%d\n",
> +                         gfn, sl1ma,
> +                         range->mappings_hwm,
> +                         range->nr_mappings,
> +                         d->arch.paging.shadow.free_pages);
> +#endif
> +            }
> +        }
> +        else /* clear */
> +        {
> +            if (pl && pl->sl1ma == sl1ma )
> +            {
> +#if DEBUG_update_vram_mapping
> +                gdprintk(XENLOG_DEBUG,
> +                         "[%lx] clear    sl1ma:%lx mappings:%d\n",
> +                         gfn, sl1ma,
> +                         range->nr_mappings-1);
> +#endif
> +                free_paddr_link(d, ppl, pl);
> +                if ( --range->nr_mappings == 0 )
> +                {
> +                    dirty_vram_range_free(d, range);

What's this for?  If the guest unmaps the framebuffer and remaps it (or
if the shadow PTs of the mappings are temporarily discarded) this will
stop us from tracking the new mappings until the toolstack asks for the
bitmap (and then it will be expensive to go and find the mappings).

> +                }
> +                effective = 1;
> +            }
> +        }
> +    }
> + out:
> +    return effective;
> +}

> +/* shadow_track_dirty_vram()
> + * This is the API called by the guest to determine which pages in the range
> + * from [begin_pfn:begin_pfn+nr) have been dirtied since the last call.
> + * It creates the domain's dv_dirty_vram on demand. 
> + * It creates ranges on demand when some [begin_pfn:nr) is first encountered.
> + * To collect the dirty bitmask it calls shadow_scan_dirty_flags().
> + * It copies the dirty bitmask into guest storage.
> + */
> +int shadow_track_dirty_vram(struct domain *d,
> +                            unsigned long begin_pfn,
> +                            unsigned long nr,
> +                            XEN_GUEST_HANDLE_64(uint8) guest_dirty_bitmap)
> +{
> +    int rc = 0;
> +    unsigned long end_pfn = begin_pfn + nr;
> +    int flush_tlb = 0;
> +    dv_range_t *range;
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +    if (end_pfn < begin_pfn
> +            || begin_pfn > p2m->max_mapped_pfn
> +            || end_pfn >= p2m->max_mapped_pfn)

I know you just copied this from the old definition but the limits seem
wrong here -- I think it should be:

    if ( end_pfn < begin_pfn || end_pfn > p2m->max_mapped_pfn + 1 )


> +/* hap_clean_vram_tracking_range()
> + * For all the pages in the range specified by [begin_pfn,nr),
> + * note in the dirty bitmap any page that has been marked as read-write,
> + * which signifies that the page has been dirtied, and reset the page
> + * to ram_logdirty. 
> + */
> +void hap_clean_vram_tracking_range(struct domain *d,
> +                                   unsigned long begin_pfn,
> +                                   unsigned long nr,
> +                                   uint8_t *dirty_bitmap)
> +{
> +    int i;
> +    unsigned long pfn;
> +    dv_dirty_vram_t *dirty_vram = d->arch.hvm_domain.dirty_vram;
> +    dv_range_t *range;
> +
> +    ASSERT(p2m_locked_by_me(p2m_get_hostp2m(d)));
> +    ASSERT(paging_locked_by_me(d));
> +    
> +    if ( !dirty_vram )
> +    {
> +        gdprintk(XENLOG_DEBUG, "Should only be called while tracking dirty 
> vram.\n");
> +        return;
> +    }
> +
> +    range = dirty_vram_range_find(d, begin_pfn, nr);
> +    if (!range)
> +        return;

Oughtn't we to return all 1s in the bitmap here?  If the range isn't
currently being tracked we should conservatively assume it's all dirty,
right?

> +
> +    /* set l1e entries of P2M table to be read-only. */
> +    /* On first write, it page faults, its entry is changed to read-write,
> +     * its bit in the dirty bitmap is set, and on retry the write succeeds. 
> */
> +    for (i = 0, pfn = range->begin_pfn; pfn < range->end_pfn; i++, pfn++)
> +    {
> +        p2m_type_t pt;
> +        pt = p2m_change_type(d, pfn, p2m_ram_rw, p2m_ram_logdirty);
> +        if (pt == p2m_ram_rw)
> +            dirty_bitmap[i >> 3] |= (1 << (i & 7));
> +    }
> +    flush_tlb_mask(d->domain_dirty_cpumask);
> +}
> +
> +static void hap_vram_tracking_init(struct domain *d)
> +{
> +    paging_log_dirty_init(d, hap_enable_vram_tracking,
> +                          hap_disable_vram_tracking,
> +                          NULL);
> +}
> +
> +/* hap_track_dirty_vram()
> + * Create the domain's dv_dirty_vram struct on demand.
> + * Create a dirty vram range on demand when some [begin_pfn:begin_pfn+nr] is 
> first encountered.
> + * Collect the guest_dirty bitmask, a bit mask of the dirties vram pages, by
> + * calling paging_log_dirty_range().
> + */
> +int hap_track_dirty_vram(struct domain *d,
> +                         unsigned long begin_pfn,
> +                         unsigned long nr,
> +                         XEN_GUEST_HANDLE_64(uint8) guest_dirty_bitmap)
> +{
> +    long rc = 0;
> +    dv_dirty_vram_t *dirty_vram;
> +    int restart_log_dirty = 0;
> +
> +    paging_lock(d);
> +    dirty_vram = d->arch.hvm_domain.dirty_vram;
> +    if ( nr )
> +    {
> +        dv_range_t *range = NULL;
> +        int size = (nr + BITS_PER_LONG - 1) / BITS_PER_LONG;
> +        unsigned long dirty_bitmap[size];

All the users of this array cast to (uint8_t *) -- just declare it as
uint8_t * instead?

> +
> +        /* Already tracking dirty vram? */
> +        if ( paging_mode_log_dirty(d) && dirty_vram ) /* yes */
> +        {
> +            /* Handle the addition of another range */
> +            range = dirty_vram_range_find(d, begin_pfn, nr);
> +            if ( !range )
> +            {
> +                rc = -ENOMEM;
> +                if ( !(range = dirty_vram_range_alloc(d, begin_pfn, nr)) )
> +                    goto param_fail;
> +                restart_log_dirty = 1;
> +            }
> +        }
> +        /* Just starting to track dirty vram? */
> +        else if ( !paging_mode_log_dirty(d) && !dirty_vram ) /* yes */
> +        {
> +            rc = -ENOMEM;
> +            if ( !(dirty_vram = dirty_vram_alloc(d)) )
> +                goto param_fail;
> +            
> +            if ( !(range = dirty_vram_range_find_or_alloc(d, begin_pfn, nr)) 
> )
> +                goto param_fail;
> +
> +            restart_log_dirty = 1;
> +            /* Initialize callbacks for vram tracking */
> +            hap_vram_tracking_init(d);
> +        }
> +        else
> +        {
> +            /* Test for invalid combination */
> +            if ( !paging_mode_log_dirty(d) && dirty_vram )
> +                rc = -EINVAL;
> +            else /* logging dirty of all memory, not tracking dirty vram */
> +                rc = -ENODATA;
> +            goto param_fail;
> +        }
> +        
> +        if (restart_log_dirty) 
> +        {
> +            /* disable then enable log dirty */

Why disable and re-enable?  The call to paging_log_dirty_range() below
will reset the p2m entries of the range you care about, so I think all
you need to do is enable it in the 'just starting' case above.

And, since you know you're in HAP mode, not in log-dirty mode, and
already have the paging lock, you can just set 
d->arch.paging.mode |= PG_log_dirty there rather than jumping through
the paging_log_dirty_enable() path and messing with locks.

> +            paging_unlock(d);
> +            if (paging_mode_log_dirty(d))
> +                paging_log_dirty_disable(d);
> +          
> +            rc = paging_log_dirty_enable(d);
> +            paging_lock(d);
> +            if (rc != 0)
> +                goto param_fail;
> +        }
> +        
> +        paging_unlock(d);
> +        memset(dirty_bitmap, 0x00, size * BYTES_PER_LONG);
> +     paging_log_dirty_range(d, begin_pfn, nr, (uint8_t*)dirty_bitmap);
> +        rc = -EFAULT;
> +        if ( copy_to_guest(guest_dirty_bitmap,
> +                           (uint8_t*)dirty_bitmap,
> +                           size * BYTES_PER_LONG) == 0 )
> +        {
> +            rc = 0;
> +        }
> +    }
> +    else
> +    {
> +        /* If zero pages specified while already tracking dirty vram
> +         * then stop tracking */
> +        if ( paging_mode_log_dirty(d) && dirty_vram ) {
> +            paging_unlock(d);
> +            rc = paging_log_dirty_disable(d);
> +            paging_lock(d);
> +            dirty_vram_free(d);

This is different from the shadow case -- there, IIUC, you just ignore
requests where nr == 0; here, you tear down all vram tracking.
Can you choose one of those, and document it in the public header?

> +        } else /* benign no-op */
> +        {
> +            rc = 0;
> +        }
> +        paging_unlock(d);
> +    }
> +
> +    return rc;


> +/* paging_mark_dirty_hap()
> + * Make a hap page writeable and mark it as dirty.
> + * This done atomically under the p2m and paging locks to avoid leaving
> + * a window where the page might be modified without being marked as dirty.
> + */

I'm perplexed by this -- AFAICT it's either not necessary (because all
log-dirty read/clean ops are done with the domain paused) or not sufficient
(because although the bitmap and the PTE are updated under the p2m lock,
the actual dirtying of the page happens at some other time).  Can you
spell out for me exactly what this is protecting against?

> +typedef int (*hash_pfn_callback_t)(struct vcpu *v,
> +                                   mfn_t smfn,
> +                                   unsigned long begin_pfn,
> +                                   unsigned long end_pfn,
> +                                   int *removed);
> +
> +static int hash_pfn_foreach(struct vcpu *v, 
> +                            unsigned int callback_mask, 
> +                            hash_pfn_callback_t callbacks[], 
> +                            unsigned long begin_pfn,
> +                            unsigned long end_pfn)
> +/* Walk the hash table looking at the types of the entries and 
> + * calling the appropriate callback function for each entry. 
> + * The mask determines which shadow types we call back for, and the array
> + * of callbacks tells us which function to call.
> + * Any callback may return non-zero to let us skip the rest of the scan. 
> + *
> + * WARNING: Callbacks MUST NOT add or remove hash entries unless they 
> + * then return non-zero to terminate the scan. */

This code duplication is a bit much.  I think you should recast the
existing hash_foreach() function to take a pointer as its fourth
argument instead of an MFN, and then make the existing callers just cast
their MFN argument as a pointer.   The you can use the same function,
passing a pointer to a struct { begin, end, removed }.

Please make the changes to hash_foreach() in a separate patch from the
dirty_vram stuff. 

> @@ -1211,12 +1164,14 @@ static int shadow_set_l1e(struct vcpu *v,
>                  shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
>                  /* fall through */
>              case 0:
> -                shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
> +                shadow_vram_fix_l1e(old_sl1e, new_sl1e, sl1e, sl1mfn, d);
>                  break;
>              }
>          }
>      } 
>  
> +    shadow_vram_fix_l1e(old_sl1e, new_sl1e, sl1e, sl1mfn, d);

Why is this being called twice here?

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.