Xen project Mailing List

Re: [Xen-devel] RFC: AMD support for paging

To: "Tim Deegan" <tim@xxxxxxx>

From: "Andres Lagar-Cavilla" <andres@xxxxxxxxxxxxxxxx>

Date: Wed, 15 Feb 2012 09:09:11 -0800

Cc: olaf@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, Wei Wang <wei.wang2@xxxxxxx>, keir.xen@xxxxxxxxx, jbeulich@xxxxxxxx, adin@xxxxxxxxxxxxxx

Delivery-date: Wed, 15 Feb 2012 17:09:36 +0000

Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=message-id :in-reply-to:references:date:subject:from:to:cc:reply-to :mime-version:content-type:content-transfer-encoding; q=dns; s= lagarcavilla.org; b=NcQtsKM66akZAfKGuFggjjem57wTLigVX1TzWqQP0Chl UhVLCKJ+glPp9l5VKabqH+n4AycKV9BGf+FkBWG31tQzjeM1HQ+4/DKUAgLYFDsL 1emTL8X0svyc6RHxbLJgsiQByuev0I9kAwFVE5yK38ozg96tDVe0IQkPqKmzZtE=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> At 08:06 -0800 on 15 Feb (1329293205), Andres Lagar-Cavilla wrote: >> > Where will p2m access bit be stored? Please note that bit 52 - bit 58 >> > for pte must be zero for p2m_ram_rw pages. For iommu pte, only bit 63 >> > and bit 1 - bit 8 are free to use if PR bit = 1. >> >> Wei, >> >> Why *must* bits 52-58 be zero for p2m_ram_rw pages? Or is it just the >> current convention? >> >> I propose limiting p2m type to bits 52-55, and storing p2m access on >> bits >> 56-58. So, p2m_ram_rw pages with a non-zero access would break the "must >> be zero" requirement/convention. > > Eh, let's step back a bit from this. The problem is that the IOMMU > can't handle these bits being set, but the IOMMU can't handle paged-out > or CoW memory either (because there's no way to cause the peripheral to > retry a DMA that faulted). > > So can we just say that any VM that's going to use paging, sharing > or access can't have a unified p2m and IOMMU table? That's a bit ugly > since the admin will have to make the choice, but it seems like the > hardware is forcing us into it. > > Maybe the sensible choice is to default to _not_ sharing p2m and IOMMU > tables, and have thet be something an expert user can turn on (and which > disables the other features)? > > Should we have a hypervisor-level interlock between PCI passthrough and > paging/sharing/access anyway? Doing both is unlikely to lead to > happiness. That's basically my thinking, these are mutually exclusive conditions. And I'm curious as to whether that is causing the triple fault that's killing my domains. You seem to imply that p2m+IOMMU unification is turned on by default -- that would crash PoD domains on AMD, wouldn't it? Andres > > Tim. > >> Right now, without any considerations for paging, we're storing the p2m >> type in bits 52-58 when PR=1. That p2m type can be non zero. p2m_ram_ro, >> p2m_mmio_dm, p2m_logdirty, p2m_populate_on_demand are all par for the >> course. Given your statement above, and table 14 in the IOMMU manual, >> how >> is this even working? Or is it not working? >> >> Would a violation of these rules cause a VMEXIT_SHUTDOWN? >> >> Thanks a lot for the info! >> Andres >> > >> > Thanks, >> > Wei >> > >> >> An alternative to enabling mem_access on AMD processors will be to >> limit >> >> the access types to 3 bits. This would force disabling support for >> two >> >> modes. I'm inclined to disable two out of X, W and WX. I don't think >> >> they >> >> make sense without R permissions. >> >> Thanks! >> >> Andres >> >> >> >>> >> >>> Thanks, >> >>> Wei >> >>> >> >>>> >> >>>> Finally, the triple fault. Maybe I'm missing something obvious. >> >>>> Comments >> >>>> welcome. >> >>>> >> >>>> Patch inline below, thanks! >> >>>> Andres >> >>>> >> >>>> Enable AMD support for paging. >> >>>> >> >>>> Signed-off-by: Andres Lagar-Cavilla<andres@xxxxxxxxxxxxxxxx> >> >>>> Signed-off-by: Adin Scannell<adin@xxxxxxxxxxx> >> >>>> >> >>>> diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/mem_event.c >> >>>> --- a/xen/arch/x86/mm/mem_event.c >> >>>> +++ b/xen/arch/x86/mm/mem_event.c >> >>>> @@ -537,10 +537,6 @@ int mem_event_domctl(struct domain *d, x >> >>>> if ( !hap_enabled(d) ) >> >>>> break; >> >>>> >> >>>> - /* Currently only EPT is supported */ >> >>>> - if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) >> >>>> - break; >> >>>> - >> >>>> rc = -EXDEV; >> >>>> /* Disallow paging in a PoD guest */ >> >>>> if ( p2m->pod.entry_count ) >> >>>> diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/p2m-pt.c >> >>>> --- a/xen/arch/x86/mm/p2m-pt.c >> >>>> +++ b/xen/arch/x86/mm/p2m-pt.c >> >>>> @@ -53,6 +53,20 @@ >> >>>> #define P2M_BASE_FLAGS \ >> >>>> (_PAGE_PRESENT | _PAGE_USER | _PAGE_DIRTY | >> _PAGE_ACCESSED) >> >>>> >> >>>> +#ifdef __x86_64__ >> >>>> +/* l1e_from_pfn is not designed to have INVALID_MFN stored. The >> >>>> 0xff..ff >> >>>> + * value tramples over the higher-order bits used for flags (NX, >> >>>> p2mt, >> >>>> + * etc.) This happens for paging entries. Thus we do this >> clip/unclip >> >>>> + * juggle for l1 entries only (no paging superpages!) */ >> >>>> +#define EFF_MFN_WIDTH (PADDR_BITS-PAGE_SHIFT) /* 40 bits */ >> >>>> +#define clipped_mfn(mfn) ((mfn)& ((1UL<< EFF_MFN_WIDTH) - >> 1)) >> >>>> +#define unclip_mfn(mfn) ((clipped_mfn((mfn)) == INVALID_MFN) ? >> \ >> >>>> + INVALID_MFN : (mfn)) >> >>>> +#else >> >>>> +#define clipped_mfn(mfn) (mfn) >> >>>> +#define unclip_mfn(mfn) (mfn) >> >>>> +#endif /* __x86_64__ */ >> >>>> + >> >>>> static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn) >> >>>> { >> >>>> unsigned long flags; >> >>>> @@ -77,6 +91,9 @@ static unsigned long p2m_type_to_flags(p >> >>>> case p2m_invalid: >> >>>> case p2m_mmio_dm: >> >>>> case p2m_populate_on_demand: >> >>>> + case p2m_ram_paging_out: >> >>>> + case p2m_ram_paged: >> >>>> + case p2m_ram_paging_in: >> >>>> default: >> >>>> return flags; >> >>>> case p2m_ram_ro: >> >>>> @@ -168,7 +185,7 @@ p2m_next_level(struct p2m_domain *p2m, m >> >>>> shift, max)) ) >> >>>> return 0; >> >>>> >> >>>> - /* PoD: Not present doesn't imply empty. */ >> >>>> + /* PoD/paging: Not present doesn't imply empty. */ >> >>>> if ( !l1e_get_flags(*p2m_entry) ) >> >>>> { >> >>>> struct page_info *pg; >> >>>> @@ -384,8 +401,9 @@ p2m_set_entry(struct p2m_domain *p2m, un >> >>>> 0, L1_PAGETABLE_ENTRIES); >> >>>> ASSERT(p2m_entry); >> >>>> >> >>>> - if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) ) >> >>>> - entry_content = l1e_from_pfn(mfn_x(mfn), >> >>>> + if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) || >> >>>> + (p2mt == p2m_ram_paged) || (p2mt == >> p2m_ram_paging_in) ) >> >>>> + entry_content = l1e_from_pfn(clipped_mfn(mfn_x(mfn)), >> >>>> p2m_type_to_flags(p2mt, >> >>>> mfn)); >> >>>> else >> >>>> entry_content = l1e_empty(); >> >>>> @@ -393,7 +411,7 @@ p2m_set_entry(struct p2m_domain *p2m, un >> >>>> if ( entry_content.l1 != 0 ) >> >>>> { >> >>>> p2m_add_iommu_flags(&entry_content, 0, >> >>>> iommu_pte_flags); >> >>>> - old_mfn = l1e_get_pfn(*p2m_entry); >> >>>> + old_mfn = unclip_mfn(l1e_get_pfn(*p2m_entry)); >> >>>> } >> >>>> /* level 1 entry */ >> >>>> p2m->write_p2m_entry(p2m, gfn, p2m_entry, table_mfn, >> >>>> entry_content, 1); >> >>>> @@ -615,11 +633,12 @@ pod_retry_l1: >> >>>> sizeof(l1e)); >> >>>> >> >>>> if ( ret == 0 ) { >> >>>> + unsigned long l1e_mfn = unclip_mfn(l1e_get_pfn(l1e)); >> >>>> p2mt = p2m_flags_to_type(l1e_get_flags(l1e)); >> >>>> - ASSERT(l1e_get_pfn(l1e) != INVALID_MFN || >> !p2m_is_ram(p2mt)); >> >>>> + ASSERT( (l1e_mfn != INVALID_MFN || !p2m_is_ram(p2mt)) || >> >>>> + (l1e_mfn == INVALID_MFN&& p2m_is_paging(p2mt)) >> ); >> >>>> >> >>>> - if ( p2m_flags_to_type(l1e_get_flags(l1e)) >> >>>> - == p2m_populate_on_demand ) >> >>>> + if ( p2mt == p2m_populate_on_demand ) >> >>>> { >> >>>> /* The read has succeeded, so we know that the >> mapping >> >>>> * exits at this point. */ >> >>>> @@ -641,7 +660,7 @@ pod_retry_l1: >> >>>> } >> >>>> >> >>>> if ( p2m_is_valid(p2mt) || p2m_is_grant(p2mt) ) >> >>>> - mfn = _mfn(l1e_get_pfn(l1e)); >> >>>> + mfn = _mfn(l1e_mfn); >> >>>> else >> >>>> /* XXX see above */ >> >>>> p2mt = p2m_mmio_dm; >> >>>> @@ -783,18 +802,26 @@ pod_retry_l2: >> >>>> pod_retry_l1: >> >>>> if ( (l1e_get_flags(*l1e)& _PAGE_PRESENT) == 0 ) >> >>>> { >> >>>> + p2m_type_t l1t = p2m_flags_to_type(l1e_get_flags(*l1e)); >> >>>> /* PoD: Try to populate */ >> >>>> - if ( p2m_flags_to_type(l1e_get_flags(*l1e)) == >> >>>> p2m_populate_on_demand ) >> >>>> + if ( l1t == p2m_populate_on_demand ) >> >>>> { >> >>>> if ( q != p2m_query ) { >> >>>> if ( !p2m_pod_demand_populate(p2m, gfn, >> >>>> PAGE_ORDER_4K, >> >>>> q) ) >> >>>> goto pod_retry_l1; >> >>>> } else >> >>>> *t = p2m_populate_on_demand; >> >>>> + } else { >> >>>> + if ( p2m_is_paging(l1t) ) >> >>>> + { >> >>>> + *t = l1t; >> >>>> + /* No need to unclip due to check below */ >> >>>> + mfn = _mfn(l1e_get_pfn(*l1e)); >> >>>> + } >> >>>> } >> >>>> >> >>>> unmap_domain_page(l1e); >> >>>> - return _mfn(INVALID_MFN); >> >>>> + return (l1t == p2m_ram_paging_out) ? mfn : >> _mfn(INVALID_MFN); >> >>>> } >> >>>> mfn = _mfn(l1e_get_pfn(*l1e)); >> >>>> *t = p2m_flags_to_type(l1e_get_flags(*l1e)); >> >>>> @@ -914,7 +941,7 @@ static void p2m_change_type_global(struc >> >>>> flags = l1e_get_flags(l1e[i1]); >> >>>> if ( p2m_flags_to_type(flags) != ot ) >> >>>> continue; >> >>>> - mfn = l1e_get_pfn(l1e[i1]); >> >>>> + mfn = unclip_mfn(l1e_get_pfn(l1e[i1])); >> >>>> gfn = i1 + (i2 + (i3 >> >>>> #if CONFIG_PAGING_LEVELS>= 4 >> >>>> + (i4 * L3_PAGETABLE_ENTRIES) >> >>>> @@ -923,7 +950,7 @@ static void p2m_change_type_global(struc >> >>>> * L2_PAGETABLE_ENTRIES) * >> >>>> L1_PAGETABLE_ENTRIES; >> >>>> /* create a new 1le entry with the new type >> */ >> >>>> flags = p2m_type_to_flags(nt, _mfn(mfn)); >> >>>> - l1e_content = l1e_from_pfn(mfn, flags); >> >>>> + l1e_content = l1e_from_pfn(clipped_mfn(mfn), >> >>>> flags); >> >>>> p2m->write_p2m_entry(p2m, gfn,&l1e[i1], >> >>>> l1mfn, l1e_content, 1); >> >>>> } >> >>>> @@ -1073,7 +1100,7 @@ long p2m_pt_audit_p2m(struct p2m_domain >> >>>> entry_count++; >> >>>> continue; >> >>>> } >> >>>> - mfn = l1e_get_pfn(l1e[i1]); >> >>>> + mfn = unclip_mfn(l1e_get_pfn(l1e[i1])); >> >>>> ASSERT(mfn_valid(_mfn(mfn))); >> >>>> m2pfn = get_gpfn_from_mfn(mfn); >> >>>> if ( m2pfn != gfn&& >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> Xen-devel mailing list >> >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >> >>>> http://lists.xensource.com/xen-devel >> >>>> >> >>> >> >>> >> >>> >> >> >> >> >> >> >> > >> > >> > >> >> > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.