[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] a few questions about superpage support

Hi Ian,

Thanks again for the help. See response inline.

2013/9/10 Ian Campbell <Ian.Campbell@xxxxxxxxxx>:
> On Mon, 2013-09-09 at 15:04 -0700, Antonin Bas wrote:
>> Hi,
>> First of all, thank you very much for your help. Please see inline comments.
>> 2013/9/9 Ian Campbell <Ian.Campbell@xxxxxxxxxx>:
>> > On Fri, 2013-09-06 at 16:52 -0700, Antonin Bas wrote:
>> >> Hi,
>> >>
>> >> I am working on a project that relies on superpages within a guest. Of
>> >> course these superpages need to be backed by actual machine pages.
>> >
>> > Which type of guest are you running? Mosto f my reply is specific to HVM
>> > which is what was implied by your interest in HAP.
>> I am indeed running HVM guests.
>> >
>> > There is no inherent need for things which are mapped as superpages in
>> > the guest pagetables be also mapped as superpages in the p2m (e.g. HAP)
>> > mappings. It is fine for a 2MB guest mapping to be translated via a
>> > block of 4K mappings in the p2m (and vice versa).
>> >
>> > Unless perhaps you mean that your usecase adds an additional
>> > requirement?
>> Thanks. I have a much better idea of what's actually going on now. In
>> my use case, I run a process which makes extensive use of a 1GB memory
>> region (with memory accesses randomly distributed over that region). I
>> was hoping that by using a 1GB hugepage in the guest for that process
>> and having this 1GB page mapped to an actual 1GB physical block, I
>> would avoid cache misses (it is my understanding that there are some
>> cache lines in the TLB reserved for 1GB mappings, both for gva -> gpa
>> and gpa -> ma).
>> But from what you are saying, it seems that there is no way to
>> guarantee that the guest 1GB hugepage will be translated via a 1GB
>> mapping in the p2m.
> I'm not all that familiar with the internals but I think not. At least
> not without modifying Xen to make it true.
>> >
>> >>
>> >> I am using this version of Xen:
>> >> (XEN) Xen version 4.2.2_04-0.7.5 (abuild@) (gcc (SUSE Linux) 4.3.4
>> >> [gcc-4_3-branch revision 152973]) Fri Jun 14 12:22:34 UTC 2013
>> >>
>> >>
>> >> HAP is enabled:
>> >> ...
>> >> (XEN) VMX: Supported advanced features:
>> >> (XEN)  - APIC MMIO access virtualisation
>> >> (XEN)  - APIC TPR shadow
>> >> (XEN)  - Extended Page Tables (EPT)
>> >> (XEN)  - Virtual-Processor Identifiers (VPID)
>> >> (XEN)  - Virtual NMI
>> >> (XEN)  - MSR direct-access bitmap
>> >> (XEN)  - Unrestricted Guest
>> >> (XEN)  - APIC Register Virtualization
>> >> (XEN)  - Virtual Interrupt Delivery
>> >> (XEN) HVM: ASIDs enabled.
>> >> (XEN) HVM: VMX enabled
>> >> (XEN) HVM: Hardware Assisted Paging (HAP) detected
>> >> (XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
>> >> ...
>> >>
>> >> I am  using the boot line option allowsuperpage=1 and the guest config
>> >> includes superpages=1
>> >
>> > Is the former not a PV guest only thing?
>> >
>> > And I can't see any code about the latter in the xl toolstack.
>> >
>> > I thought superpages were the default, if any are available, for HVM
>> > guests.
>> You are right about superpages, I don't think this one is used.
>> For allowsuperpage, the doc says nothing about it being used only for
>> PV guests, and I think it is used for all guetss. The default value is
>> true anyway. The only reference to it is in get_page_from_l2e() in
>> x86/mm.c.
> get_page_from_l2e is a PV only function, I think. The option was added
> by bd1cd81d6484 "x86: PV support for hugepages". I suspect the docs are
> just wrong.
> The default in the code appears to be false, so I suspect the code is
> doubly wrong...
>> >
>> >> With a guest memory of 4096, I can observe EPT entries that look like 
>> >> this:
>> >> (XEN) gfn: 10f600            mfn: 306c00            order:  9  is_pod: 0
>> >> At first I though they meant that guest 2M superpages were indeed
>> >> being backed by 2M host machine superpages. I though this was weird
>> >> since I could observe these entries even without explicitly requesting
>> >> hugepages from within the guest. I set transparent hugepages in the
>> >> guest to never (seems to be enabled by default in SUSE) but I could
>> >> still observe these 'order: 9' entries, which means I don't actually
>> >> know what they represent.
>> >
>> > As I say above, the guest and p2m use of superpage mappings are
>> > independent with HAP. And p2m superpages are the default for HVM.
>> Ok. One more question though. How does the VMM decides on the number
>> of 1GB mapping and 2M mappings to use?
> I'm not sure but I think based on availability of such pages to allocate
> and alignment of the RAM within the guest, accounting for holes etc.
>> When I boot a 4GB guest, I get the following mappings:
>>   4KB PAGES: 0x0000000000000200
>>   2MB PAGES: 0x00000000000003fb
>>   1GB PAGES: 0x0000000000000002
>> I am running a 32GB machine (2 NUMA nodes, each with an Ivy Bridge CPU
>> and 16GB memory, HT enabled), and I have allocated 8GB of memory to
>> dom0. This is the first guest I am starting, so I probably still have
>> a lot of contiguous free memory. Why not use 3 1G superpages or even
>> 4?
> I expect there are MMIO holes under 1MB and between 3GB-4GB which
> prevent the use of 1GB mapping, you can probably get a sense of that
> from the e820 presented to the guest.

The e820 presented to the guest is consistent with what you are saying.

>> >
>> >> 2) 1GB superpage support. When I try to request 1GB in the guest at
>> >> boot time, I get the following message from the kernel: "hugepagesz:
>> >> Unsupported page size 1024 M", which is not a surprise since the
>> >> pdpe1gb cpu flag is not enabled. How can I enable this flag for the
>> >> domU vcpus? If this flag can be enabled, will the VMM try to map my
>> >> guest 1GB superpages to host physical 1GB hugepage in the EPT?
>> >
>> > Does your physical CPU support this?
>> >
>> > The toolstacks have options for controlling the masking of guest visible
>> > CPUID values. I'd be surprised if this particular wasn't passed through
>> > to guests by default.
>> Yes, my IvyBridge CPU supports pdpe1gb. I read here
>> (http://www.gossamer-threads.com/lists/xen/devel/273636) that some
>> flags were masked off by default -even when supported by the physical
>> CPU- because they threaten live migration. However I cannot find where
>> this happens in the tools code (Xen 4.2.2).
> Looking at xen/include/asm-x86/cpufeature.h Xen's symbolic name for this
> flag appears to be X86_FEATURE_PAGE1GB. Grep finds a few uses in the
> tools and in Xen itself, most of them are PV related.
> There is one relating to the hypervisor in hvm_cpuid the clearing is
> conditional on hvm_pse1gb_supported, which is conditional on the
> presence of the HVM_HAP_SUPERPAGE_1GB capability. IIRC your logs said
> that was present and above the memory layout shows 2 1GB pages getting
> used.
> Given that I don't know why this isn't exposed to the guest. Might be
> worth instrumenting things up?

I spent some time looking at the source code for xen and its
toolstack. (Xen 4.2.2) and instrumenting it up. My conclusions are:
- There is nothing in xen itself clearing off the pdpe1gb flag unless
not supported (by the boot cpu, as per boot_cpu_data), in which case
hvm_pse1gb_supported returns 0, which was not the case for me
- libxc default policy is to NOT activate that flag unless it is
explicitly specified by the user: the extended feature bits are
handled by the vendor specific functions in tools/libxc/xc_cpuid_x86.c
-in our case intel_xc_cpuid_policy(). Examination of this function
reveals that the default behavior is to keep the pdpe1gb bit off
(other flags, like rdtscp are set to on by default for x86_64

I managed to enable it by adding cpuid="host,page1gb=k" to my guest
configuration file.

Please tell me if anything I said does not make sense. Thanks.

> Ian.


Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.