[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RFC: AMD support for paging


  • To: "Wei Wang" <wei.wang2@xxxxxxx>
  • From: "Andres Lagar-Cavilla" <andres@xxxxxxxxxxxxxxxx>
  • Date: Thu, 16 Feb 2012 06:36:12 -0800
  • Cc: olaf@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, tim@xxxxxxx, keir.xen@xxxxxxxxx, jbeulich@xxxxxxxx, adin@xxxxxxxxxxxxxx
  • Delivery-date: Thu, 16 Feb 2012 14:36:34 +0000
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=message-id :in-reply-to:references:date:subject:from:to:cc:reply-to :mime-version:content-type:content-transfer-encoding; q=dns; s= lagarcavilla.org; b=jK1UYddNCm9d/lbV1DwCnF7S0qP4ua5Y3+tKJNdxYH0d jtAOpUgPMa3eockPxlL5y2/czyzSYg65FfS6y8RUUP6mn/XjPeTjxfq1gfeX/VMb guojBOn0sW0Ssov/S5g4xRK7Do5lrPM28smcsyZ8cKz6xWHHYERK4YNk/OlcIN8=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> On 02/15/2012 05:06 PM, Andres Lagar-Cavilla wrote:
>>> On 02/15/2012 04:14 PM, Andres Lagar-Cavilla wrote:
>>>>> On 02/14/2012 08:05 PM, Andres Lagar-Cavilla wrote:
>>>>>> We started hashing out some AMD support for mem_paging and
>>>>>> mem_access.
>>>>>> Right now my VMs boot, page out a bit, and then die on an HVM triple
>>>>>> fault.
>>>>>>
>>>>>> Most importantly, I want to get somebody from AMD to comment/help
>>>>>> out
>>>>>> on
>>>>>> this. It feels like we're inches away from enabling support for this
>>>>>> very
>>>>>> nice feature. I'm not sure who exactly on AMD monitors the list for
>>>>>> these
>>>>>> kinds of things. It'd be great to have you on board!
>>>>>>
>>>>>> For starters, the changes to the p2m code are relatively mild, but
>>>>>> it'd
>>>>>> be
>>>>>> great if somebody spots a red flag.
>>>>>>
>>>>>> Another issue: comments indicate that bits 59-62 in NPT entries are
>>>>>> used
>>>>>> for IOMMU flags but effectively bits 61-62 are. Repossessing one bit
>>>>>> (59)
>>>>>> would give us enough space to support mem_access. Right now we only
>>>>>> have
>>>>>> 7
>>>>>> bits available for Xen flags and that is not enough for paging and
>>>>>> access.
>>>>>> Is bit 59 effectively reserved?
>>>>>
>>>>> Hi
>>>>> bit 59 is used by iommu hardware for ATS response. In most cases, for
>>>>> p2m_ram_rw pages, U bit must be 0. But maybe for other page types
>>>>> that
>>>>> are not potentially used by DMA, this bit could be non-zero. I could
>>>>> tested it on my iommu machines if you had some patches that use U
>>>>> bits.
>>>>
>>>> Hi Wei, thanks for pitching in! I assume you're talking about table 14
>>>> (and fig 9) in http://support.amd.com/us/Processor_TechDocs/48882.pdf
>>>
>>> Yes, indeed.
>>>
>>>> I don't think this will work. The p2m access value is arbitrary, and
>>>> independent of the p2m type. So bit 59 is out of reach and we're stuck
>>>> with 7 bits for Xen use. Thanks for the clarification.
>>>
>>> Where will p2m access bit be stored? Please note that bit 52 - bit 58
>>> for pte must be zero for p2m_ram_rw pages. For iommu pte, only bit 63
>>> and bit 1 - bit 8 are free to use if PR bit = 1.
>>
>> Wei,
>>
>> Why *must* bits 52-58 be zero for p2m_ram_rw pages? Or is it just the
>> current convention?
>>
>> I propose limiting p2m type to bits 52-55, and storing p2m access on
>> bits
>> 56-58. So, p2m_ram_rw pages with a non-zero access would break the "must
>> be zero" requirement/convention.
>>
>> Right now, without any considerations for paging, we're storing the p2m
>> type in bits 52-58 when PR=1. That p2m type can be non zero. p2m_ram_ro,
>> p2m_mmio_dm, p2m_logdirty, p2m_populate_on_demand are all par for the
>> course. Given your statement above, and table 14 in the IOMMU manual,
>> how
>> is this even working? Or is it not working?
>
> It works because only p2m_ram_rw (which is 0) pages suppose to be used
> by DMA. But, indeed, if iommu tries to access pages with non-zero types,
> like p2m_ram_ro,,it will have io page faults. It seems that we don't
> have use cases like this.
>
I guess no one unleashed log dirty on a domain doing DMA. Rightly so.

> If mem access bits are independent to p2m types, I think we might have a
> few solutions here:
> 1) reverse iommu pt sharing for amd iommu totally.
> 2) disable iommu pt sharing when xen paging enabled.
>
> I am open for both of them.

Great. When I get back to this, I'll look into making the two feature sets
mutually exclusive. And then we can enjoy full mem_access on AMD
platforms.

Thanks,
Andres
>
> Thanks,
>
> Wei
>
>
>> Would a violation of these rules cause a VMEXIT_SHUTDOWN?
>>
>> Thanks a lot for the info!
>> Andres
>>>
>>> Thanks,
>>> Wei
>>>
>>>> An alternative to enabling mem_access on AMD processors will be to
>>>> limit
>>>> the access types to 3 bits. This would force disabling support for two
>>>> modes. I'm inclined to disable two out of X, W and WX. I don't think
>>>> they
>>>> make sense without R permissions.
>>>> Thanks!
>>>> Andres
>>>>
>>>>>
>>>>> Thanks,
>>>>> Wei
>>>>>
>>>>>>
>>>>>> Finally, the triple fault. Maybe I'm missing something obvious.
>>>>>> Comments
>>>>>> welcome.
>>>>>>
>>>>>> Patch inline below, thanks!
>>>>>> Andres
>>>>>>
>>>>>> Enable AMD support for paging.
>>>>>>
>>>>>> Signed-off-by: Andres Lagar-Cavilla<andres@xxxxxxxxxxxxxxxx>
>>>>>> Signed-off-by: Adin Scannell<adin@xxxxxxxxxxx>
>>>>>>
>>>>>> diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/mem_event.c
>>>>>> --- a/xen/arch/x86/mm/mem_event.c
>>>>>> +++ b/xen/arch/x86/mm/mem_event.c
>>>>>> @@ -537,10 +537,6 @@ int mem_event_domctl(struct domain *d, x
>>>>>>                 if ( !hap_enabled(d) )
>>>>>>                     break;
>>>>>>
>>>>>> -            /* Currently only EPT is supported */
>>>>>> -            if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
>>>>>> -                break;
>>>>>> -
>>>>>>                 rc = -EXDEV;
>>>>>>                 /* Disallow paging in a PoD guest */
>>>>>>                 if ( p2m->pod.entry_count )
>>>>>> diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/p2m-pt.c
>>>>>> --- a/xen/arch/x86/mm/p2m-pt.c
>>>>>> +++ b/xen/arch/x86/mm/p2m-pt.c
>>>>>> @@ -53,6 +53,20 @@
>>>>>>     #define P2M_BASE_FLAGS \
>>>>>>             (_PAGE_PRESENT | _PAGE_USER | _PAGE_DIRTY |
>>>>>> _PAGE_ACCESSED)
>>>>>>
>>>>>> +#ifdef __x86_64__
>>>>>> +/* l1e_from_pfn is not designed to have INVALID_MFN stored. The
>>>>>> 0xff..ff
>>>>>> + * value tramples over the higher-order bits used for flags (NX,
>>>>>> p2mt,
>>>>>> + * etc.) This happens for paging entries. Thus we do this
>>>>>> clip/unclip
>>>>>> + * juggle for l1 entries only (no paging superpages!) */
>>>>>> +#define EFF_MFN_WIDTH       (PADDR_BITS-PAGE_SHIFT) /* 40 bits */
>>>>>> +#define clipped_mfn(mfn)    ((mfn)&    ((1UL<<    EFF_MFN_WIDTH) -
>>>>>> 1))
>>>>>> +#define unclip_mfn(mfn)     ((clipped_mfn((mfn)) == INVALID_MFN) ?
>>>>>> \
>>>>>> +                                INVALID_MFN : (mfn))
>>>>>> +#else
>>>>>> +#define clipped_mfn(mfn)    (mfn)
>>>>>> +#define unclip_mfn(mfn)     (mfn)
>>>>>> +#endif /* __x86_64__ */
>>>>>> +
>>>>>>     static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
>>>>>>     {
>>>>>>         unsigned long flags;
>>>>>> @@ -77,6 +91,9 @@ static unsigned long p2m_type_to_flags(p
>>>>>>         case p2m_invalid:
>>>>>>         case p2m_mmio_dm:
>>>>>>         case p2m_populate_on_demand:
>>>>>> +    case p2m_ram_paging_out:
>>>>>> +    case p2m_ram_paged:
>>>>>> +    case p2m_ram_paging_in:
>>>>>>         default:
>>>>>>             return flags;
>>>>>>         case p2m_ram_ro:
>>>>>> @@ -168,7 +185,7 @@ p2m_next_level(struct p2m_domain *p2m, m
>>>>>>                                           shift, max)) )
>>>>>>             return 0;
>>>>>>
>>>>>> -    /* PoD: Not present doesn't imply empty. */
>>>>>> +    /* PoD/paging: Not present doesn't imply empty. */
>>>>>>         if ( !l1e_get_flags(*p2m_entry) )
>>>>>>         {
>>>>>>             struct page_info *pg;
>>>>>> @@ -384,8 +401,9 @@ p2m_set_entry(struct p2m_domain *p2m, un
>>>>>>                                        0, L1_PAGETABLE_ENTRIES);
>>>>>>             ASSERT(p2m_entry);
>>>>>>
>>>>>> -        if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) )
>>>>>> -            entry_content = l1e_from_pfn(mfn_x(mfn),
>>>>>> +        if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) ||
>>>>>> +             (p2mt == p2m_ram_paged) || (p2mt == p2m_ram_paging_in)
>>>>>> )
>>>>>> +            entry_content = l1e_from_pfn(clipped_mfn(mfn_x(mfn)),
>>>>>>                                              p2m_type_to_flags(p2mt,
>>>>>> mfn));
>>>>>>             else
>>>>>>                 entry_content = l1e_empty();
>>>>>> @@ -393,7 +411,7 @@ p2m_set_entry(struct p2m_domain *p2m, un
>>>>>>             if ( entry_content.l1 != 0 )
>>>>>>             {
>>>>>>                 p2m_add_iommu_flags(&entry_content, 0,
>>>>>> iommu_pte_flags);
>>>>>> -            old_mfn = l1e_get_pfn(*p2m_entry);
>>>>>> +            old_mfn = unclip_mfn(l1e_get_pfn(*p2m_entry));
>>>>>>             }
>>>>>>             /* level 1 entry */
>>>>>>             p2m->write_p2m_entry(p2m, gfn, p2m_entry, table_mfn,
>>>>>> entry_content, 1);
>>>>>> @@ -615,11 +633,12 @@ pod_retry_l1:
>>>>>>                                sizeof(l1e));
>>>>>>
>>>>>>         if ( ret == 0 ) {
>>>>>> +        unsigned long l1e_mfn = unclip_mfn(l1e_get_pfn(l1e));
>>>>>>             p2mt = p2m_flags_to_type(l1e_get_flags(l1e));
>>>>>> -        ASSERT(l1e_get_pfn(l1e) != INVALID_MFN ||
>>>>>> !p2m_is_ram(p2mt));
>>>>>> +        ASSERT( (l1e_mfn != INVALID_MFN || !p2m_is_ram(p2mt)) ||
>>>>>> +                (l1e_mfn == INVALID_MFN&&    p2m_is_paging(p2mt))
>>>>>> );
>>>>>>
>>>>>> -        if ( p2m_flags_to_type(l1e_get_flags(l1e))
>>>>>> -             == p2m_populate_on_demand )
>>>>>> +        if ( p2mt == p2m_populate_on_demand )
>>>>>>             {
>>>>>>                 /* The read has succeeded, so we know that the
>>>>>> mapping
>>>>>>                  * exits at this point.  */
>>>>>> @@ -641,7 +660,7 @@ pod_retry_l1:
>>>>>>             }
>>>>>>
>>>>>>             if ( p2m_is_valid(p2mt) || p2m_is_grant(p2mt) )
>>>>>> -            mfn = _mfn(l1e_get_pfn(l1e));
>>>>>> +            mfn = _mfn(l1e_mfn);
>>>>>>             else
>>>>>>                 /* XXX see above */
>>>>>>                 p2mt = p2m_mmio_dm;
>>>>>> @@ -783,18 +802,26 @@ pod_retry_l2:
>>>>>>     pod_retry_l1:
>>>>>>         if ( (l1e_get_flags(*l1e)&    _PAGE_PRESENT) == 0 )
>>>>>>         {
>>>>>> +        p2m_type_t l1t = p2m_flags_to_type(l1e_get_flags(*l1e));
>>>>>>             /* PoD: Try to populate */
>>>>>> -        if ( p2m_flags_to_type(l1e_get_flags(*l1e)) ==
>>>>>> p2m_populate_on_demand )
>>>>>> +        if ( l1t == p2m_populate_on_demand )
>>>>>>             {
>>>>>>                 if ( q != p2m_query ) {
>>>>>>                     if ( !p2m_pod_demand_populate(p2m, gfn,
>>>>>> PAGE_ORDER_4K,
>>>>>> q) )
>>>>>>                         goto pod_retry_l1;
>>>>>>                 } else
>>>>>>                     *t = p2m_populate_on_demand;
>>>>>> +        } else {
>>>>>> +            if ( p2m_is_paging(l1t) )
>>>>>> +            {
>>>>>> +                *t = l1t;
>>>>>> +                /* No need to unclip due to check below */
>>>>>> +                mfn = _mfn(l1e_get_pfn(*l1e));
>>>>>> +            }
>>>>>>             }
>>>>>>
>>>>>>             unmap_domain_page(l1e);
>>>>>> -        return _mfn(INVALID_MFN);
>>>>>> +        return (l1t == p2m_ram_paging_out) ? mfn :
>>>>>> _mfn(INVALID_MFN);
>>>>>>         }
>>>>>>         mfn = _mfn(l1e_get_pfn(*l1e));
>>>>>>         *t = p2m_flags_to_type(l1e_get_flags(*l1e));
>>>>>> @@ -914,7 +941,7 @@ static void p2m_change_type_global(struc
>>>>>>                         flags = l1e_get_flags(l1e[i1]);
>>>>>>                         if ( p2m_flags_to_type(flags) != ot )
>>>>>>                             continue;
>>>>>> -                    mfn = l1e_get_pfn(l1e[i1]);
>>>>>> +                    mfn = unclip_mfn(l1e_get_pfn(l1e[i1]));
>>>>>>                         gfn = i1 + (i2 + (i3
>>>>>>     #if CONFIG_PAGING_LEVELS>= 4
>>>>>>                                          + (i4 * L3_PAGETABLE_ENTRIES)
>>>>>> @@ -923,7 +950,7 @@ static void p2m_change_type_global(struc
>>>>>>                                * L2_PAGETABLE_ENTRIES) *
>>>>>> L1_PAGETABLE_ENTRIES;
>>>>>>                         /* create a new 1le entry with the new type
>>>>>> */
>>>>>>                         flags = p2m_type_to_flags(nt, _mfn(mfn));
>>>>>> -                    l1e_content = l1e_from_pfn(mfn, flags);
>>>>>> +                    l1e_content = l1e_from_pfn(clipped_mfn(mfn),
>>>>>> flags);
>>>>>>                         p2m->write_p2m_entry(p2m, gfn,&l1e[i1],
>>>>>>                                              l1mfn, l1e_content, 1);
>>>>>>                     }
>>>>>> @@ -1073,7 +1100,7 @@ long p2m_pt_audit_p2m(struct p2m_domain
>>>>>>                                     entry_count++;
>>>>>>                                 continue;
>>>>>>                             }
>>>>>> -                        mfn = l1e_get_pfn(l1e[i1]);
>>>>>> +                        mfn = unclip_mfn(l1e_get_pfn(l1e[i1]));
>>>>>>                             ASSERT(mfn_valid(_mfn(mfn)));
>>>>>>                             m2pfn = get_gpfn_from_mfn(mfn);
>>>>>>                             if ( m2pfn != gfn&&
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>>> http://lists.xensource.com/xen-devel
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.