[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 13/22] x86/hvm: use a per-pCPU monitor table in HAP mode



On Fri Jul 26, 2024 at 4:21 PM BST, Roger Pau Monne wrote:
> Instead of allocating a monitor table for each vCPU when running in HVM HAP
> mode, use a per-pCPU monitor table, which gets the per-domain slot updated on
> guest context switch.
>
> This limits the amount of memory used for HVM HAP monitor tables to the amount
> of active pCPUs, rather than to the number of vCPUs.  It also simplifies vCPU
> allocation and teardown, since the monitor table handling is removed from
> there.
>
> Note the switch to using a per-CPU monitor table is done regardless of whether

s/per-CPU/per-pCPU/

> Address Space Isolation is enabled or not.  Partly for the memory usage
> reduction, and also because it allows to simplify the VM tear down path by not
> having to cleanup the per-vCPU monitor tables.
>
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> ---
> Note the monitor table is not made static because uses outside of the file
> where it's defined will be added by further patches.
> ---
>  xen/arch/x86/hvm/hvm.c             | 60 ++++++++++++++++++++++++
>  xen/arch/x86/hvm/svm/svm.c         |  5 ++
>  xen/arch/x86/hvm/vmx/vmcs.c        |  1 +
>  xen/arch/x86/hvm/vmx/vmx.c         |  4 ++
>  xen/arch/x86/include/asm/hap.h     |  1 -
>  xen/arch/x86/include/asm/hvm/hvm.h |  8 ++++
>  xen/arch/x86/mm.c                  |  8 ++++
>  xen/arch/x86/mm/hap/hap.c          | 75 ------------------------------
>  xen/arch/x86/mm/paging.c           |  4 +-
>  9 files changed, 87 insertions(+), 79 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 7f4b627b1f5f..3f771bc65677 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -104,6 +104,54 @@ static const char __initconst warning_hvm_fep[] =
>  static bool __initdata opt_altp2m_enabled;
>  boolean_param("altp2m", opt_altp2m_enabled);
>  
> +DEFINE_PER_CPU(root_pgentry_t *, monitor_pgt);
> +
> +static int allocate_cpu_monitor_table(unsigned int cpu)

To avoid ambiguity, could we call these *_pcpu_*() instead?

> +{
> +    root_pgentry_t *pgt = alloc_xenheap_page();
> +
> +    if ( !pgt )
> +        return -ENOMEM;
> +
> +    clear_page(pgt);
> +
> +    init_xen_l4_slots(pgt, _mfn(virt_to_mfn(pgt)), INVALID_MFN, NULL,
> +                      false, true, false);
> +
> +    ASSERT(!per_cpu(monitor_pgt, cpu));
> +    per_cpu(monitor_pgt, cpu) = pgt;
> +
> +    return 0;
> +}
> +
> +static void free_cpu_monitor_table(unsigned int cpu)
> +{
> +    root_pgentry_t *pgt = per_cpu(monitor_pgt, cpu);
> +
> +    if ( !pgt )
> +        return;
> +
> +    per_cpu(monitor_pgt, cpu) = NULL;
> +    free_xenheap_page(pgt);
> +}
> +
> +void hvm_set_cpu_monitor_table(struct vcpu *v)
> +{
> +    root_pgentry_t *pgt = this_cpu(monitor_pgt);
> +
> +    ASSERT(pgt);
> +
> +    setup_perdomain_slot(v, pgt);

Why not modify them as part of write_ptbase() instead? As it stands, it appears
to be modifying the PTEs of what may very well be our current PT, which makes
the perdomain slot be in a $DEITY-knows-what state until the next flush
(presumably the write to cr3 in write_ptbase()?; assuming no PCIDs).

Setting the slot up right before the cr3 change should reduce the potential for
misuse.

> +
> +    make_cr3(v, _mfn(virt_to_mfn(pgt)));
> +}
> +
> +void hvm_clear_cpu_monitor_table(struct vcpu *v)
> +{
> +    /* Poison %cr3, it will be updated when the vCPU is scheduled. */
> +    make_cr3(v, INVALID_MFN);

I think this would benefit from more exposition in the comment. If I'm getting
this right, after descheduling this vCPU we can't assume it'll be rescheduled
on the same pCPU, and if it's not it'll end up using a different monitor table.
This poison value is meant to highlight forgetting to set cr3 in the
"ctxt_switch_to()" path. 

All of that can be deduced from what you wrote and sufficient headscratching
but seeing how this is invoked from the context switch path it's not incredibly
clear wether you meant the perdomain slot would be updated by the next vCPU or
what I stated in the previous paragraph.

Assuming it is as I mentioned, maybe hvm_forget_cpu_monitor_table() would
convey what it does better? i.e: the vCPU forgets/unbinds the monitor table
from its internal state.

Cheers,
Alejandro



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.