[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH][VT] Patch to allow VMX domainsto be destroyed or shut down cleanly



I think the Xin Xiaohui's patch resolved most problem for the page left
issue.

Also another small issue found is the on vmx_set_cr0. On Windows, the
guest will enter/leave protected mode many times.  Following code cause
problem.
    if ((value & X86_CR0_PE) && (value & X86_CR0_PG) && !paging_enabled)
{
        /*
         * The guest CR3 must be pointing to the guest physical.
         */
        if ( !VALID_MFN(mfn = get_mfn_from_pfn(
            d->arch.arch_vmx.cpu_cr3 >> PAGE_SHIFT)) ||
             !get_page(pfn_to_page(mfn), d->domain) )
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        {
            printk("Invalid CR3 value = %lx", d->arch.arch_vmx.cpu_cr3);
            domain_crash_synchronous(); /* need to take a clean path */
        }

We should place the get_page to when guest set cr3 when not paging, i.e
:
   case 3:
    {
        unsigned long old_base_mfn, mfn;

        /*
         * If paging is not enabled yet, simply copy the value to CR3.
         */
        if (!vmx_paging_enabled(d)) {
                ..... get page here....

After above two change, windows destroied succesfully after
create/login/open IE/destroy.

Thanks
Yunhong Jiang

Khoa Huynh wrote:
> Keir Fraser wrote:
>> I mean forcibly decrement them to zero and free them right there and
>> then. Of course, as you point out, the problem is that some of the
>> pages are mapped in domain0. I'm not sure how we can distinguish
>> tainted refcnts from genuine external references. Perhaps there's a
>> proper way we should be destructing the full shadow pagetables such
>> that the refcnts end up at zero.
> 
> Thanks for your comment.  I have done extensive tracing through
> the domain destruction code in the hypervisor in the last few
> days.
> 
> The bottom line:  after domain destruction code in the hypervisor
> is done, all shadow pages were indeed freed up - even though
> the shadow_tainted_refcnts flag was set.  I now believe the
> remaining pages are genuinely externally referenced (possibly
> by the qemu device model still running in domain0).
> 
> Here are more details on what I have found:
> 
> Ideally, when we destroy or shut down a VMX domain, the general
> page reference counts ended up at 0 in shadow mode, so that the
> pages can be released properly from the domain.
> 
> I have traced quite a bit of code for different scenarios
> involving Windows XP running in a VMX domain.  I only
> did simple operations in Windows XP, but I tried to destroy
> the VMX domain at different times (e.g. during Windows XP boot,
> during simple operations, after Windows XP has been shutdown, etc.)
> 
> For non-VMX (Linux) domains, after we relinquish memory in
> domain_relinquish_resources(), all pages in the domain's page
> list indeed had reference count of 0 and were properly freed from
> the xen heap - just like we expected.
> 
> For VMX (e.g., Windows XP) domains, after we relinquish memory in
> domain_relinquish_resources(), depending on how many activities
> were done in Windows XP, there were anywhere from 2 to 100 pages
> remaining just before the domain's structures were freed up
> by the hypervisor.  Most of these pages still have page
> reference counts of 1, and therefore, could not be freed
> from the heap by the hypervisor.  This prevents the rest
> of the domain's resources from being released, and therefore,
> 'xm list' still shows the VMX domains after they were destroyed.
> 
> In shadow mode, the following things could be reflected
> in the page (general) reference counts:
> 
> (a) General stuff:
>     - page is allocated (PGC_allocated)
>     - page is pinned
>     - page is pointed by CR3's
> (b) Shadow page tables (l1, l2, hl2, etc.)
> (c) Out-of-sync entries
> (d) Grant table mappings
> (e) External references (not through grant table)
> (f) Monitor page table references (external shadow mode)
> (g) Writable PTE predictions
> (h) GDTs/LDTs
> 
> So I put in a lot of instrumentation and tracing code,
> and made sure that the above things were taken into
> account and removed from the page reference counts
> during the domain destruction code sequence in the
> hypervisor.  During this code sequence, we disable
> shadow mode (shadow_mode_disable()) and the
> shadow_tainted_refcnts flag was set.  However,
> much to my surprise, the page reference counts
> were properly taken care of in shadow mode, and
> all shadow pages (including those in l1, l2, hl2
> tables and snapshots) were all freed up.
> 
> In particular, here's where each of the things
> in the above list was taken into account during
> the domain destruction code sequence in the
> hypervisor:
> 
> (a) General stuff:
>     - None of remaining pages have PGC_allocated
>       flag set
>     - None of remaining pages are still pinned
>     - The monitor shadow ref was 0, and all
>       pages pointed to by CR3's were taken care
>       of in free_shadow_pages()
> (b) All shadow pages (including those pages in
>     l1, l2, hl2, snapshots) were freed properly.
>     I implemented counters to track all shadow
>     page promotions/allocations and demotions/
>     deallocations throughout the hypervisor code,
>     and at the end after we relinquished all domain
>     memory pages, these counters did indeed
>     return to 0 - as we expected.
> (c) out-of-sync entries -> in free_out_of_sync_state()
>     called by free_shadow_pages().
> (d) grant table mappings -> the count of active
>     grant table mappings is 0 after the domain
>     destruction sequence in the hypervisor is
>     executed.
> (e) external references not mapped via grant table
>     -> I believe that these include the qemu-dm
>     pages which still remain after we relinquish
>     all domain memory pages - as the qemu-dm may
>     still be active after a VMX domain has been
>     destroyed.
> (f) external monitor page references -> all references
>     from monitor page table are dropped in
>     vmx_relinquish_resources(), and monitor table
>     itself is freed in domain_destruct().  In fact,
>     in my code traces, the monitor shadow reference
>     count was 0 after the domain destruction code in
>     the hypervisor.
> (g) writable PTE predictions -> I didn't see any pages in
>     this category in my code traces, but if there
>     are, they would be freed up in free_shadow_pages().
> (h) GDTs/LDTs -> these were destroyed in destroy_gdt() and
>     invalidate_shadow_ldt() called from domain_relinquish_
>     resources().
> 
> Based on the code instrumentation and tracing above, I am
> pretty confident that the shadow page reference counts
> were handled properly during the domain destruction code
> sequence in the hypervisor.  There is a problem in keeping
> track of shadow page counts (domain->arch.shadow_page_count),
> and I will submit a patch to fix this shortly.  However, this
> does not really impact how shadow pages are handled.
> 
> Consequently, the pages that still remain after the domain
> destruction code sequence in the hypervisor are externally
> referenced and may belong to the qemu device model running
> in domain0.  The fact that qemu-dm is still active for some
> time after a VMX domain has been torn down in the hypervisor
> is evident by examining the tools code (python).  In fact,
> if I forcibly free these remaining pages from the xen heap,
> the system/dom0 crashed.
> 
> Am I missing anything ? Your comments, suggestions, etc.,
> are welcome!  Thanks for reading this rather long email :-)
> 
> Khoa H.
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.