[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4] x86/vmx: save guest non-register state in hvm_hw_cpu



On Tue, Mar 22, 2022 at 11:11 AM Andrew Cooper
<Andrew.Cooper3@xxxxxxxxxx> wrote:
>
> On 21/03/2022 21:12, Tamas K Lengyel wrote:
> > During VM forking and resetting a failed vmentry has been observed due
> > to the guest non-register state going out-of-sync with the guest register
> > state. For example, a VM fork reset right after a STI instruction can 
> > trigger
> > the failed entry. This is due to the guest non-register state not being 
> > saved
> > from the parent VM, thus the reset operation only copies the register state.
> >
> > Fix this by including the guest non-register state in hvm_hw_cpu so that 
> > when
> > its copied from the parent VM the vCPU state remains in sync.
> >
> > SVM is not currently wired-in as VM forking is VMX only and saving 
> > non-register
> > state during normal save/restore/migration operation hasn't been needed. If
> > deemed necessary in the future it can be wired in by adding a 
> > svm-substructure
> > to hvm_hw_cpu.
>
> I disagree here.  This bug isn't really to do with fuzzing; it's to do
> with our APIs being able to get/set vCPU state correctly, and fuzzing is
> the example which demonstrated the breakage.
>
> This will also cause very subtle bugs for the guest-transparent
> migration work, and the live update work, both of which want to shift
> vcpu state behind a VM which hasn't actively entered Xen via hypercall.
>
> (Quick tangent.  Seriously, can the SDM be fixed so it actually
> enumerates the Activity States.)
>
> Xen doesn't currently support any situation where Intel's idea of
> Activity State is anything other than 0.  This in turn is buggy, because
> we don't encode VPF_blocked anywhere.  An activity state of hlt might
> not be best place to encode this, because notably absent from Intel's
> activity state is mwait.  We'll also terminate things like schedop_poll
> early.
>
> Next, AMD does have various fields from interruptibility.  If you want
> me to write the patch then fine, but they should not be excluded from a
> patch like this.  AMD do not have separate bits for STI and MovSS, due
> to microarchitectural differences, but the single INTERRUPT_SHADOW bit
> does need managing, as does the blocked-by-NMI bit which is emulated on
> SVM and older Intel CPUs.
>
> Minor tangent, blocked-by-SMI needs cross-linking with vm-entry
> controls, so I'm not sure it is something we ought to expose.
>
> Next, it turns out that MSR_DEBUGCTL isn't included anywhere (not even
> the MSR list).  It is important, because it forms part of the VMEntry
> cross-check with PENDING_DBG and TF.
>
> Next, we also don't preserve PDPTRs.  This is another area where Intel
> and AMD CPUs behave differently, but under Intel's architecture, the
> guest kernel can clobber the 32bit PAE block of pointers and everything
> will function correctly until the next mov into cr3.  There are
> definitely bugs in Xen's emulated pagewalk in this area.
>
> So there are a lot of errors with hvm_hw_cpu and I bet I haven't found
> them all.
>
> As discussed at multiple previous XenSummits, the current load/save mess
> needs moving out of Xen, and that would be the correct time to fix the
> other errors, but it probably is too much for this case.
>
>
> At a minimum, there shouldn't be a VMX specific union, because we are
> talking about architecture-agnostic properties of the vCPU.  It's fine
> for the format to follow Intel's bit layout for the subset of bits we
> tolerate saving/restoring, but this needs discussing in the header, and
> needs rejecting on set.  An extra check/reject is 0% overhead for
> forking, so I don't find that a credible argument against doing it.

Sure, during the fork itself it's negligible. It's during fork_reset,
which we do thousands of times per second where that sanity checking
is both unneeded (we know the setting is getting copied from the
parent) and will quickly add up if we need to do a bunch of bitshifts
to ensure only the valid bits are restored (plus converted in case the
format will be the non-vmx version).

> I'm not sure if we want to include the activity state at the moment
> without a better idea of how to encode VPF_blocked, but I think DEBUGCTL
> does need including.  This in turn involves transmitting the LBR MSRs too.

I don't really have much to add here. If there are concerns in regards
to the side-effect of this on the pre-existing save/restore/migration
route then what I can do is add only a Xen-internal function that will
allow the fork_reset to copy the non-register state from the parent
and not expose it via the public header. That works for me just as
well, my use-case doesn't have a requirement for these bits to be
exposed externally.

Tamas



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.