[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Consult some concepts about shadow paging mechanism


  • To: Gianluca Guida <gianluca.guida@xxxxxxxxxxxxx>
  • From: Jui-Hao Chiang <windtracekimo@xxxxxxxxx>
  • Date: Sun, 3 May 2009 09:39:26 -0400
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Sun, 03 May 2009 06:41:10 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=vSWUT0kdUk63ZhALjgHTJd1THscoKoHc7Eelpgqow2rP52nanYP7gyMnX9QSG7cK9Q tlLOiFj+RQgFbvWEjtPSxsIVUavdBZ5hFDwr0aq55ppiVF3lUQSJf9fis0EFMhRXMcM9 rL16+L0wjZI9bW7TffZTxEPkx8MO0zjKudZvg=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

I got the answer because I made a mistake to pass four sl2mfn entries
in v->arch.paging.shadow.l3table[] to sh_walk_l2_table().
Then truth is I only need to pass v->arch.paging.shadow.l3table[0]
because SHADOW_FOREACH_L2E has already done a good job on looping the
four sl2mfns.

But I have another doubt in traversing SPT from level 3, level 2, and level1.
When I am traversing down to the level 1 SPT, I found several
inconsistency between gl1e and sl1e content, which is the same as the
mechanism in sh_audit_l1_table(). Is this a normal case? I thought
they should keep consistent at all times.

My purpose is to walk down the SPT and GPT during each process context
switch (sh_update_cr3), and do some statistics first, e.g. dirty,
access, present bit.

Now I tried another checking in level 2 SPT by skipping those sl1mfn
which does not pass sh_mfn_is_a_page_table(sl1mfn) check, then the
inconsistency is gone is level 1 SPT traversing.

Can anyone show some hint about how to do the right thing? Is there
some special type of SPTE that I should not traverse down?

Many thanks,
Jui-Hao



On Fri, May 1, 2009 at 10:47 PM, Jui-Hao Chiang <windtracekimo@xxxxxxxxx> wrote:
> Hi, sorry for disturbing you guys again.
>
> Assume guest's paging level is 2 and shadow is using level 3 PAE.
> I am now trying to dump the L2 shadow page table information in the
> beginning of sh_update_cr3() as the following (actually copying the
> code from sh_audit_l2_table and audit_gfn_to_mfn functions)
>
> The code accidentally crashes in  guest_l2e_get_flags(*gl2e) of the
> sh_walk_l2_table I wrote.
> However, the weird part is the code doesn't crash in gfn =
> guest_l2e_get_gfn(*gl2e) which is accessing the *gl2e in a similar way
> as guest_l2e_get_flags.
>
> static inline mfn_t
> convert_gfn_to_mfn(struct vcpu *v, gfn_t gfn, mfn_t gmfn)
> {
>    p2m_type_t p2mt;
>    if ( !shadow_mode_translate(v->domain) )
>        return _mfn(gfn_x(gfn));
>
>    if ( (mfn_to_page(gmfn)->u.inuse.type_info & PGT_type_mask)
>         != PGT_writable_page )
>        return _mfn(gfn_x(gfn)); // This is a paging-disabled shadow
>    else
>        return gfn_to_mfn(v->domain, gfn, &p2mt);
> }
>
> /* JuiHao: walk the l2 shadow page table based on input sl2mfn */
> static int sh_walk_l2_table(struct vcpu *v, mfn_t sl2mfn, mfn_t x)
> {
>        guest_l2e_t *gl2e, *gp;
>        shadow_l2e_t *sl2e;
>        mfn_t sl1mfn, gl2mfn;
>        gfn_t gfn;
>        mfn_t gmfn;
>        int done = 0;
>
>        /* Follow the backpointer in struct shadow_page_info to get guest 
> l2mfn */
>        gl2mfn = _mfn(mfn_to_shadow_page(sl2mfn)->backpointer);
>        gl2e = gp = sh_map_domain_page(gl2mfn);
>
>        SHADOW_FOREACH_L2E(sl2mfn, sl2e, &gl2e, done, v->domain, {
>
>                gfn = guest_l2e_get_gfn(*gl2e);  // ###!!!! Works Fine 
> !!!!!####
>                sl1mfn = shadow_l2e_get_mfn(*sl2e);
>
>                if (mfn_valid(sl1mfn) && (shadow_l2e_get_flags(*sl2e) & 
> _PAGE_PRESENT)) {
>
>                        // We get this gmfn is just to double check if this is 
> equal to sl1mfn
>                        gmfn = (guest_l2e_get_flags(*gl2e) & _PAGE_PSE) // 
> ###!!!! CRASH !!!!!####
>                                ? get_fl1_shadow_status(v, gfn)
>                                : get_shadow_status(v, convert_gfn_to_mfn(v, 
> gfn, gl2mfn),
>                                SH_type_l1_shadow);
>
>                        if (mfn_x(gmfn) != mfn_x(sl1mfn)) {
>                                printk("!! gmfn %" PRI_mfn " != sl1mfn %" 
> PRI_mfn "\n", gmfn, sl1mfn);
>                        } else {
>                                printk("going down to traverse level 1 SPT\n");
>                        }
>                }
>
>        });
>        sh_unmap_domain_page(gp);
>        return 0;
> }
>
> Could you help a little bit on this?
> Many thanks,
> Jui-Hao
>
> On Fri, Apr 24, 2009 at 9:32 AM, Gianluca Guida
> <gianluca.guida@xxxxxxxxxxxxx> wrote:
>> On Fri, Apr 24, 2009 at 6:23 AM, Jui-Hao Chiang <windtracekimo@xxxxxxxxx> 
>> wrote:
>>> I have some additional doubts as the following:
>>> (1) For normal data page, in order to propagate the Dirty or Access
>>> bit from SPTE to GPTE, the hypervisor needs to set Read-Only in the
>>> SPTE. When the write page fault of this data page comes, hypervisor
>>> can propagate the Dirty or Access bit to GPTE and set it to R/W. My
>>> question is when does the hypervisor make it Read-Only again? Is there
>>> any place inside the source code you can point out?
>>
>> What happens is this: the guest has to clear the dirty/accessed bit
>> and then flush the tlb (or invlpg the entry).
>> If the pagetable is mapped read only (as in levels > 1) the write to
>> the pagetable will trigger the emulator that will update the entry.
>> Otherwhise (if the page is out of sync, which means a writable guest
>> pagetable, and this happens when it's an L1) the flushtlb will do the
>> job of updating the shadow entry.
>>
>> Look at how sh_propagate function works and when it get called. It's
>> what you're looking for.
>>
>>> (2) How many shadow pages are maintained for each guest domain? If the
>>> hypervisor keep only one shadow page table for the active process in
>>> each guest domain, then during the guest context-switch, it might
>>> erase the entire shadow page table, and re-construct it for the new
>>> process, which seems a lot of overhead. I have checked the
>>> sh_update_cr3(), but not sure of the detailed mechanism.
>>
>> There's a pool of shadow memory that get reused in a pseudo-LRU
>> manner. Across cr3 switch toplevel pagetables are kept in memory, and
>> unshadowed when evicted by the allocator or when other things happens,
>> mostly based on heuristic and reference counting.
>>
>> Thanks,
>> Gianluca
>>
>> --
>> It was a type of people I did not know, I found them very strange and
>> they did not inspire confidence at all. Later I learned that I had been
>> introduced to electronic engineers.
>>                                                  E. W. Dijkstra
>>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.