[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] page ref/type count overflows



>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 29.01.09 09:45 >>>
>On 29/01/2009 08:34, "Jan Beulich" <jbeulich@xxxxxxxxxx> wrote:
>> Actually, I'd like to go a step further: Is there any reason why struct
>> shadow_page_info must be separate from struct page_info (rather than
>> sharing the definition, requiring some re-ordering of its elements)?
>
>Not really, apart from wanting to keep shadow stuff in one place in a
>private header file, I suppose. Would it risk turning page_info's definition
>into crazy union soup? If it could be done as something like:
> unsigned long count_info
> union {
>  struct { page_info stuff }; // anonymous
>  struct { sh_page_info stuff }; // anonymous
> } // anonymous
>That would be nicer than what we currently have, I'd agree. And the more
>stuff we can pull out of the anonymous union (e.g., perhaps a list_head?)
>then the better, of course.

Below is what I currently have it at. I'm afraid it won't get much simpler,
but I think it reasonably expresses the individual overlays. There are three
more transformations I plan to make:
- _domain -> unsigned int
- next_shadow -> __mfn_t
- split u into two unions (one having type_info, type/pinned/count, and
cpumask, the other having _domain, back, and order).

That last step is to avoid having to re-add __attribute__ ((__packed__)),
so that other (future) changes to the structure won't risk mis-aligning any
fields again.

Does this look acceptable?

Jan

/*
 * This definition is solely for the use in struct page_info (and
 * struct page_list_head), intended to allow easy adjustment once x86-64
 * wants to support more than 16Tb.
 * 'unsigned long' should be used for MFNs everywhere else.
 */
#define __mfn_t unsigned int

#ifndef __i386__
#undef page_list_entry
struct page_list_entry
{
    __mfn_t next, prev;
};
#endif

struct page_info
{
    union {
        /* Each frame can be threaded onto a doubly-linked list.
         *
         * For unused shadow pages, a list of pages of this order; for
         * pinnable shadows, if pinned, a list of other pinned shadows
         * (see sh_type_is_pinnable() below for the definition of
         * "pinnable" shadow types).
         */
        struct page_list_entry list;
        /* For non-pinnable shadows, a higher entry that points at us. */
        paddr_t up;
    };

    /* Reference count and various PGC_xxx flags and fields. */
    unsigned long count_info;

    /* Context-dependent fields follow... */
    union {

        /* Page is in use: ((count_info & PGC_count_mask) != 0). */
        struct {
            /* Owner of this page (NULL if page is anonymous). */
            unsigned long _domain; /* pickled format */
            /* Type reference count and various PGT_xxx flags and fields. */
            unsigned long type_info;
        } inuse;

        /* Page is in use by shadow code: count_info == 0. */
        struct {
            unsigned long type:5;   /* What kind of shadow is this? */
            unsigned long pinned:1; /* Is the shadow pinned? */
            unsigned long count:26; /* Reference count */
            union {
                /* When in use, GMFN of guest page we're a shadow of. */
                __mfn_t back;
                /* When free, order of the freelist we're on. */
                unsigned int order;
            };
        } sh;

        /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
        struct {
            /* Order-size of the free chunk this page is the head of. */
            u32 order;
            /* Mask of possibly-tainted TLBs. */
            cpumask_t cpumask;
        } free;

    } u;

    union {
        /*
         * Timestamp from 'TLB clock', used to avoid extra safety flushes.
         * Only valid for: a) free pages, and b) pages with zero type count
         * (except page table pages when the guest is in shadow mode).
         */
        u32 tlbflush_timestamp;

        /*
         * When PGT_partial is true then this field is valid and indicates
         * that PTEs in the range [0, @nr_validated_ptes) have been validated.
         * An extra page reference must be acquired (or not dropped) whenever
         * PGT_partial gets set, and it must be dropped when the flag gets
         * cleared. This is so that a get() leaving a page in partially
         * validated state (where the caller would drop the reference acquired
         * due to the getting of the type [apparently] failing [-EAGAIN])
         * would not accidentally result in a page left with zero general
         * reference count, but non-zero type reference count (possible when
         * the partial get() is followed immediately by domain destruction).
         * Likewise, the ownership of the single type reference for partially
         * (in-)validated pages is tied to this flag, i.e. the instance
         * setting the flag must not drop that reference, whereas the instance
         * clearing it will have to.
         *
         * If @partial_pte is positive then PTE at @nr_validated_ptes+1 has
         * been partially validated. This implies that the general reference
         * to the page (acquired from get_page_from_lNe()) would be dropped
         * (again due to the apparent failure) and hence must be re-acquired
         * when resuming the validation, but must not be dropped when picking
         * up the page for invalidation.
         *
         * If @partial_pte is negative then PTE at @nr_validated_ptes+1 has
         * been partially invalidated. This is basically the opposite case of
         * above, i.e. the general reference to the page was not dropped in
         * put_page_from_lNe() (due to the apparent failure), and hence it
         * must be dropped when the put operation is resumed (and completes),
         * but it must not be acquired if picking up the page for validation.
         */
        struct {
            u16 nr_validated_ptes;
            s8 partial_pte;
        };

        /*
         * Guest pages with a shadow.  This does not conflict with
         * tlbflush_timestamp since page table pages are explicitly not
         * tracked for TLB-flush avoidance when a guest runs in shadow mode.
         */
        u32 shadow_flags;

        /* When in use, next shadow in this hash chain */
        struct page_info *next_shadow;
    };
};


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.