[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 13/22] x86/hvm: use a per-pCPU monitor table in HAP mode



On Fri, Aug 16, 2024 at 07:02:54PM +0100, Alejandro Vallejo wrote:
> On Fri Jul 26, 2024 at 4:21 PM BST, Roger Pau Monne wrote:
> > Instead of allocating a monitor table for each vCPU when running in HVM HAP
> > mode, use a per-pCPU monitor table, which gets the per-domain slot updated 
> > on
> > guest context switch.
> >
> > This limits the amount of memory used for HVM HAP monitor tables to the 
> > amount
> > of active pCPUs, rather than to the number of vCPUs.  It also simplifies 
> > vCPU
> > allocation and teardown, since the monitor table handling is removed from
> > there.
> >
> > Note the switch to using a per-CPU monitor table is done regardless of 
> > whether
> 
> s/per-CPU/per-pCPU/

Sorry, I might not has been as consistent as I wanted with using pCPU
everywhere.

> > Address Space Isolation is enabled or not.  Partly for the memory usage
> > reduction, and also because it allows to simplify the VM tear down path by 
> > not
> > having to cleanup the per-vCPU monitor tables.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> > Note the monitor table is not made static because uses outside of the file
> > where it's defined will be added by further patches.
> > ---
> >  xen/arch/x86/hvm/hvm.c             | 60 ++++++++++++++++++++++++
> >  xen/arch/x86/hvm/svm/svm.c         |  5 ++
> >  xen/arch/x86/hvm/vmx/vmcs.c        |  1 +
> >  xen/arch/x86/hvm/vmx/vmx.c         |  4 ++
> >  xen/arch/x86/include/asm/hap.h     |  1 -
> >  xen/arch/x86/include/asm/hvm/hvm.h |  8 ++++
> >  xen/arch/x86/mm.c                  |  8 ++++
> >  xen/arch/x86/mm/hap/hap.c          | 75 ------------------------------
> >  xen/arch/x86/mm/paging.c           |  4 +-
> >  9 files changed, 87 insertions(+), 79 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 7f4b627b1f5f..3f771bc65677 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -104,6 +104,54 @@ static const char __initconst warning_hvm_fep[] =
> >  static bool __initdata opt_altp2m_enabled;
> >  boolean_param("altp2m", opt_altp2m_enabled);
> >  
> > +DEFINE_PER_CPU(root_pgentry_t *, monitor_pgt);
> > +
> > +static int allocate_cpu_monitor_table(unsigned int cpu)
> 
> To avoid ambiguity, could we call these *_pcpu_*() instead?

As replied by Jan, plain 'cpu' is physical CPU on hypervisor code
function names usually.  '_pcpu_' here would IMO imply per-CPU, which
it also is, but likely doesn't need spelling in the function name.

> > +{
> > +    root_pgentry_t *pgt = alloc_xenheap_page();
> > +
> > +    if ( !pgt )
> > +        return -ENOMEM;
> > +
> > +    clear_page(pgt);
> > +
> > +    init_xen_l4_slots(pgt, _mfn(virt_to_mfn(pgt)), INVALID_MFN, NULL,
> > +                      false, true, false);
> > +
> > +    ASSERT(!per_cpu(monitor_pgt, cpu));
> > +    per_cpu(monitor_pgt, cpu) = pgt;
> > +
> > +    return 0;
> > +}
> > +
> > +static void free_cpu_monitor_table(unsigned int cpu)
> > +{
> > +    root_pgentry_t *pgt = per_cpu(monitor_pgt, cpu);
> > +
> > +    if ( !pgt )
> > +        return;
> > +
> > +    per_cpu(monitor_pgt, cpu) = NULL;
> > +    free_xenheap_page(pgt);
> > +}
> > +
> > +void hvm_set_cpu_monitor_table(struct vcpu *v)
> > +{
> > +    root_pgentry_t *pgt = this_cpu(monitor_pgt);
> > +
> > +    ASSERT(pgt);
> > +
> > +    setup_perdomain_slot(v, pgt);
> 
> Why not modify them as part of write_ptbase() instead? As it stands, it 
> appears
> to be modifying the PTEs of what may very well be our current PT, which makes
> the perdomain slot be in a $DEITY-knows-what state until the next flush
> (presumably the write to cr3 in write_ptbase()?; assuming no PCIDs).
> 
> Setting the slot up right before the cr3 change should reduce the potential 
> for
> misuse.

The reasoning for doing it here it that the per-domain slot only needs
setting on context switch.  In the PV case write_ptbase() will be
called each time the guest switches %cr3, but setting the per-domain
slot is not required for each call if the vCPU hasn't changed.

Let me see if I can arrange for the current contents of
setup_perdomain_slot() to be merged into write_ptbase(). Note
setup_perdomain_slot() started as a wrapper to extract XPTI specific
code from paravirt_ctxt_switch_to().

> > +
> > +    make_cr3(v, _mfn(virt_to_mfn(pgt)));
> > +}
> > +
> > +void hvm_clear_cpu_monitor_table(struct vcpu *v)
> > +{
> > +    /* Poison %cr3, it will be updated when the vCPU is scheduled. */
> > +    make_cr3(v, INVALID_MFN);
> 
> I think this would benefit from more exposition in the comment. If I'm getting
> this right, after descheduling this vCPU we can't assume it'll be rescheduled
> on the same pCPU, and if it's not it'll end up using a different monitor 
> table.
> This poison value is meant to highlight forgetting to set cr3 in the
> "ctxt_switch_to()" path. 

Indeed, we would like to avoid running on a different pCPU while still
using the monitor page-tables from whatever pCPU the vCPU previously
had been running.

> All of that can be deduced from what you wrote and sufficient headscratching
> but seeing how this is invoked from the context switch path it's not 
> incredibly
> clear wether you meant the perdomain slot would be updated by the next vCPU or
> what I stated in the previous paragraph.

No, it's just about not leaving stale values in the vcpu struct.

> Assuming it is as I mentioned, maybe hvm_forget_cpu_monitor_table() would
> convey what it does better? i.e: the vCPU forgets/unbinds the monitor table
> from its internal state.

Right, I assumed that 'clear' already conveyed the concept of
unbinding from a pCPU.  If I use unbind, then I guess I should also
use 'bind' for what I currently call 'set'.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.