[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] a few questions about superpage support
Hi Ian, Thanks again for the help. See response inline. 2013/9/10 Ian Campbell <Ian.Campbell@xxxxxxxxxx>: > On Mon, 2013-09-09 at 15:04 -0700, Antonin Bas wrote: >> Hi, >> >> First of all, thank you very much for your help. Please see inline comments. >> >> 2013/9/9 Ian Campbell <Ian.Campbell@xxxxxxxxxx>: >> > On Fri, 2013-09-06 at 16:52 -0700, Antonin Bas wrote: >> >> Hi, >> >> >> >> I am working on a project that relies on superpages within a guest. Of >> >> course these superpages need to be backed by actual machine pages. >> > >> > Which type of guest are you running? Mosto f my reply is specific to HVM >> > which is what was implied by your interest in HAP. >> >> I am indeed running HVM guests. >> >> > >> > There is no inherent need for things which are mapped as superpages in >> > the guest pagetables be also mapped as superpages in the p2m (e.g. HAP) >> > mappings. It is fine for a 2MB guest mapping to be translated via a >> > block of 4K mappings in the p2m (and vice versa). >> > >> > Unless perhaps you mean that your usecase adds an additional >> > requirement? >> >> Thanks. I have a much better idea of what's actually going on now. In >> my use case, I run a process which makes extensive use of a 1GB memory >> region (with memory accesses randomly distributed over that region). I >> was hoping that by using a 1GB hugepage in the guest for that process >> and having this 1GB page mapped to an actual 1GB physical block, I >> would avoid cache misses (it is my understanding that there are some >> cache lines in the TLB reserved for 1GB mappings, both for gva -> gpa >> and gpa -> ma). >> But from what you are saying, it seems that there is no way to >> guarantee that the guest 1GB hugepage will be translated via a 1GB >> mapping in the p2m. > > I'm not all that familiar with the internals but I think not. At least > not without modifying Xen to make it true. >> >> > >> >> >> >> I am using this version of Xen: >> >> (XEN) Xen version 4.2.2_04-0.7.5 (abuild@) (gcc (SUSE Linux) 4.3.4 >> >> [gcc-4_3-branch revision 152973]) Fri Jun 14 12:22:34 UTC 2013 >> >> >> >> >> >> HAP is enabled: >> >> ... >> >> (XEN) VMX: Supported advanced features: >> >> (XEN) - APIC MMIO access virtualisation >> >> (XEN) - APIC TPR shadow >> >> (XEN) - Extended Page Tables (EPT) >> >> (XEN) - Virtual-Processor Identifiers (VPID) >> >> (XEN) - Virtual NMI >> >> (XEN) - MSR direct-access bitmap >> >> (XEN) - Unrestricted Guest >> >> (XEN) - APIC Register Virtualization >> >> (XEN) - Virtual Interrupt Delivery >> >> (XEN) HVM: ASIDs enabled. >> >> (XEN) HVM: VMX enabled >> >> (XEN) HVM: Hardware Assisted Paging (HAP) detected >> >> (XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB >> >> ... >> >> >> >> I am using the boot line option allowsuperpage=1 and the guest config >> >> includes superpages=1 >> > >> > Is the former not a PV guest only thing? >> > >> > And I can't see any code about the latter in the xl toolstack. >> > >> > I thought superpages were the default, if any are available, for HVM >> > guests. >> >> You are right about superpages, I don't think this one is used. >> For allowsuperpage, the doc says nothing about it being used only for >> PV guests, and I think it is used for all guetss. The default value is >> true anyway. The only reference to it is in get_page_from_l2e() in >> x86/mm.c. > > get_page_from_l2e is a PV only function, I think. The option was added > by bd1cd81d6484 "x86: PV support for hugepages". I suspect the docs are > just wrong. > > The default in the code appears to be false, so I suspect the code is > doubly wrong... > >> > >> >> With a guest memory of 4096, I can observe EPT entries that look like >> >> this: >> >> (XEN) gfn: 10f600 mfn: 306c00 order: 9 is_pod: 0 >> >> At first I though they meant that guest 2M superpages were indeed >> >> being backed by 2M host machine superpages. I though this was weird >> >> since I could observe these entries even without explicitly requesting >> >> hugepages from within the guest. I set transparent hugepages in the >> >> guest to never (seems to be enabled by default in SUSE) but I could >> >> still observe these 'order: 9' entries, which means I don't actually >> >> know what they represent. >> > >> > As I say above, the guest and p2m use of superpage mappings are >> > independent with HAP. And p2m superpages are the default for HVM. >> >> Ok. One more question though. How does the VMM decides on the number >> of 1GB mapping and 2M mappings to use? > > I'm not sure but I think based on availability of such pages to allocate > and alignment of the RAM within the guest, accounting for holes etc. > >> When I boot a 4GB guest, I get the following mappings: >> >> xc: info: PHYSICAL MEMORY ALLOCATION: >> 4KB PAGES: 0x0000000000000200 >> 2MB PAGES: 0x00000000000003fb >> 1GB PAGES: 0x0000000000000002 >> >> I am running a 32GB machine (2 NUMA nodes, each with an Ivy Bridge CPU >> and 16GB memory, HT enabled), and I have allocated 8GB of memory to >> dom0. This is the first guest I am starting, so I probably still have >> a lot of contiguous free memory. Why not use 3 1G superpages or even >> 4? > > I expect there are MMIO holes under 1MB and between 3GB-4GB which > prevent the use of 1GB mapping, you can probably get a sense of that > from the e820 presented to the guest. > The e820 presented to the guest is consistent with what you are saying. >> >> > >> >> 2) 1GB superpage support. When I try to request 1GB in the guest at >> >> boot time, I get the following message from the kernel: "hugepagesz: >> >> Unsupported page size 1024 M", which is not a surprise since the >> >> pdpe1gb cpu flag is not enabled. How can I enable this flag for the >> >> domU vcpus? If this flag can be enabled, will the VMM try to map my >> >> guest 1GB superpages to host physical 1GB hugepage in the EPT? >> > >> > Does your physical CPU support this? >> > >> > The toolstacks have options for controlling the masking of guest visible >> > CPUID values. I'd be surprised if this particular wasn't passed through >> > to guests by default. >> >> Yes, my IvyBridge CPU supports pdpe1gb. I read here >> (http://www.gossamer-threads.com/lists/xen/devel/273636) that some >> flags were masked off by default -even when supported by the physical >> CPU- because they threaten live migration. However I cannot find where >> this happens in the tools code (Xen 4.2.2). > > Looking at xen/include/asm-x86/cpufeature.h Xen's symbolic name for this > flag appears to be X86_FEATURE_PAGE1GB. Grep finds a few uses in the > tools and in Xen itself, most of them are PV related. > > There is one relating to the hypervisor in hvm_cpuid the clearing is > conditional on hvm_pse1gb_supported, which is conditional on the > presence of the HVM_HAP_SUPERPAGE_1GB capability. IIRC your logs said > that was present and above the memory layout shows 2 1GB pages getting > used. > > Given that I don't know why this isn't exposed to the guest. Might be > worth instrumenting things up? I spent some time looking at the source code for xen and its toolstack. (Xen 4.2.2) and instrumenting it up. My conclusions are: - There is nothing in xen itself clearing off the pdpe1gb flag unless not supported (by the boot cpu, as per boot_cpu_data), in which case hvm_pse1gb_supported returns 0, which was not the case for me - libxc default policy is to NOT activate that flag unless it is explicitly specified by the user: the extended feature bits are handled by the vendor specific functions in tools/libxc/xc_cpuid_x86.c -in our case intel_xc_cpuid_policy(). Examination of this function reveals that the default behavior is to keep the pdpe1gb bit off (other flags, like rdtscp are set to on by default for x86_64 architectures). I managed to enable it by adding cpuid="host,page1gb=k" to my guest configuration file. Please tell me if anything I said does not make sense. Thanks. > > Ian. > Antonin _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |