|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 13/22] x86/hvm: use a per-pCPU monitor table in HAP mode
On Fri, Aug 16, 2024 at 07:02:54PM +0100, Alejandro Vallejo wrote:
> On Fri Jul 26, 2024 at 4:21 PM BST, Roger Pau Monne wrote:
> > Instead of allocating a monitor table for each vCPU when running in HVM HAP
> > mode, use a per-pCPU monitor table, which gets the per-domain slot updated
> > on
> > guest context switch.
> >
> > This limits the amount of memory used for HVM HAP monitor tables to the
> > amount
> > of active pCPUs, rather than to the number of vCPUs. It also simplifies
> > vCPU
> > allocation and teardown, since the monitor table handling is removed from
> > there.
> >
> > Note the switch to using a per-CPU monitor table is done regardless of
> > whether
>
> s/per-CPU/per-pCPU/
Sorry, I might not has been as consistent as I wanted with using pCPU
everywhere.
> > Address Space Isolation is enabled or not. Partly for the memory usage
> > reduction, and also because it allows to simplify the VM tear down path by
> > not
> > having to cleanup the per-vCPU monitor tables.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> > Note the monitor table is not made static because uses outside of the file
> > where it's defined will be added by further patches.
> > ---
> > xen/arch/x86/hvm/hvm.c | 60 ++++++++++++++++++++++++
> > xen/arch/x86/hvm/svm/svm.c | 5 ++
> > xen/arch/x86/hvm/vmx/vmcs.c | 1 +
> > xen/arch/x86/hvm/vmx/vmx.c | 4 ++
> > xen/arch/x86/include/asm/hap.h | 1 -
> > xen/arch/x86/include/asm/hvm/hvm.h | 8 ++++
> > xen/arch/x86/mm.c | 8 ++++
> > xen/arch/x86/mm/hap/hap.c | 75 ------------------------------
> > xen/arch/x86/mm/paging.c | 4 +-
> > 9 files changed, 87 insertions(+), 79 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 7f4b627b1f5f..3f771bc65677 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -104,6 +104,54 @@ static const char __initconst warning_hvm_fep[] =
> > static bool __initdata opt_altp2m_enabled;
> > boolean_param("altp2m", opt_altp2m_enabled);
> >
> > +DEFINE_PER_CPU(root_pgentry_t *, monitor_pgt);
> > +
> > +static int allocate_cpu_monitor_table(unsigned int cpu)
>
> To avoid ambiguity, could we call these *_pcpu_*() instead?
As replied by Jan, plain 'cpu' is physical CPU on hypervisor code
function names usually. '_pcpu_' here would IMO imply per-CPU, which
it also is, but likely doesn't need spelling in the function name.
> > +{
> > + root_pgentry_t *pgt = alloc_xenheap_page();
> > +
> > + if ( !pgt )
> > + return -ENOMEM;
> > +
> > + clear_page(pgt);
> > +
> > + init_xen_l4_slots(pgt, _mfn(virt_to_mfn(pgt)), INVALID_MFN, NULL,
> > + false, true, false);
> > +
> > + ASSERT(!per_cpu(monitor_pgt, cpu));
> > + per_cpu(monitor_pgt, cpu) = pgt;
> > +
> > + return 0;
> > +}
> > +
> > +static void free_cpu_monitor_table(unsigned int cpu)
> > +{
> > + root_pgentry_t *pgt = per_cpu(monitor_pgt, cpu);
> > +
> > + if ( !pgt )
> > + return;
> > +
> > + per_cpu(monitor_pgt, cpu) = NULL;
> > + free_xenheap_page(pgt);
> > +}
> > +
> > +void hvm_set_cpu_monitor_table(struct vcpu *v)
> > +{
> > + root_pgentry_t *pgt = this_cpu(monitor_pgt);
> > +
> > + ASSERT(pgt);
> > +
> > + setup_perdomain_slot(v, pgt);
>
> Why not modify them as part of write_ptbase() instead? As it stands, it
> appears
> to be modifying the PTEs of what may very well be our current PT, which makes
> the perdomain slot be in a $DEITY-knows-what state until the next flush
> (presumably the write to cr3 in write_ptbase()?; assuming no PCIDs).
>
> Setting the slot up right before the cr3 change should reduce the potential
> for
> misuse.
The reasoning for doing it here it that the per-domain slot only needs
setting on context switch. In the PV case write_ptbase() will be
called each time the guest switches %cr3, but setting the per-domain
slot is not required for each call if the vCPU hasn't changed.
Let me see if I can arrange for the current contents of
setup_perdomain_slot() to be merged into write_ptbase(). Note
setup_perdomain_slot() started as a wrapper to extract XPTI specific
code from paravirt_ctxt_switch_to().
> > +
> > + make_cr3(v, _mfn(virt_to_mfn(pgt)));
> > +}
> > +
> > +void hvm_clear_cpu_monitor_table(struct vcpu *v)
> > +{
> > + /* Poison %cr3, it will be updated when the vCPU is scheduled. */
> > + make_cr3(v, INVALID_MFN);
>
> I think this would benefit from more exposition in the comment. If I'm getting
> this right, after descheduling this vCPU we can't assume it'll be rescheduled
> on the same pCPU, and if it's not it'll end up using a different monitor
> table.
> This poison value is meant to highlight forgetting to set cr3 in the
> "ctxt_switch_to()" path.
Indeed, we would like to avoid running on a different pCPU while still
using the monitor page-tables from whatever pCPU the vCPU previously
had been running.
> All of that can be deduced from what you wrote and sufficient headscratching
> but seeing how this is invoked from the context switch path it's not
> incredibly
> clear wether you meant the perdomain slot would be updated by the next vCPU or
> what I stated in the previous paragraph.
No, it's just about not leaving stale values in the vcpu struct.
> Assuming it is as I mentioned, maybe hvm_forget_cpu_monitor_table() would
> convey what it does better? i.e: the vCPU forgets/unbinds the monitor table
> from its internal state.
Right, I assumed that 'clear' already conveyed the concept of
unbinding from a pCPU. If I use unbind, then I guess I should also
use 'bind' for what I currently call 'set'.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |