[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults

On 11/22/18 7:08 PM, Roger Pau Monné wrote:
> On Thu, Nov 22, 2018 at 06:52:07PM +0200, Razvan Cojocaru wrote:
>> On 11/22/18 5:37 PM, Roger Pau Monné wrote:
>>> I don't think you are supposed to try to pause other vcpus while
>>> holding a lock, as you can see it's quite likely that you will end up
>>> deadlocking because the vCPU you are trying to pause is stuck waiting
>>> on the lock that you are holding.
>>> You should figure out whether you can get into vmx_start_reexecute
>>> without holding any locks, or alternatively drop the lock, pause the
>>> vCPUs and pick the lock again.
>>> See for example how hap_track_dirty_vram releases the lock before
>>> attempting to pause the domain for this same reason.
>> Right, this will take more thinking.
>> I've unlocked the p2m for testing and the initial hang is gone, however
>> the same problem now applies to rexec_lock: nothing prevents two or more
>> VCPUs from arriving in vmx_start_reexecute_instruction() simultaneously,
>> at which point one of them might take the lock and try to pause the
>> other, while the other is waiting to take the lock, with predictable
>> results.
>> On the other hand, releasing rexec_lock as well will allow two VCPUs to
>> end up trying to pause each other (especially unpleasant in a 2 VCPU
>> guest). At any given moment, there should be only one VCPU alive and
>> trying to reexecute an instruction - and at least one VCPU alive on the
>> guest.
>> We'll get more coffee, and of course suggestions are appreciated (as has
>> been all your help).
> Hm, I don't think it's generally safe to try to pause domain vCPUs
> from the same domain context, as you say it's likely to deadlock since
> two vCPUs from the same domain might try to pause one another.
> My knowledge of all this introspection logic is very vague, do you
> really need to stop the other vCPUs while performing this reexecution?
> What are you trying to prevent by pausing other vCPUs?

Yes, that's unfortunately very necessary.

The scenario is this: for introspection purposes, a bunch of pages are
marked read-only in the EPT (or no-execute, but for the purposes of this
example let's stick to read-only).

Now, we'll get vm_events whenever an instruction will try to write into
one of those. Vm_events are expensive, so we _really_ want to get as few
of those as possible while still keeping the guest protected. So we want
to filter out irrelevant ones.

The main category of irrelevant ones are faults caused by walking the
guest's page table. We only want events caused by an actual write into a
protected page by an actual instruction running at RIP in the guest.

So, we don't want to get those vm_events where npfec.kind !=
npfec_kind_with_gla in p2m_mem_access_check(), hence this patch:


_However_, please picture an instruction that both writes into a page P1
we're interested in, _and_ causes a write into a read-only page-walk
related page P2. Emulating the current instruction, as the upstream
patch does, does eliminate the vm_event caused by writing into P2, but
with the unfortunate side-effect of losing a potentially critical event
for the write into P1.

What this patch attempts to do is to mark P1 rwx (so allow the write),
then put the faulting VCPU into singlestep mode, then restore the
restrictions after it has finished single stepping. By now it's obvious
why all the other VCPUs need to be paused: one of them might do a
malicious write into P1 that silently succeeds (since the EPT is shared
among all VCPUs - putting altp2m aside for a moment). We don't want that.

Alternatively, we'd be happy with simply being able to set the relevant
A/D bits in the pages touched by the page walk, but after lenghty
negotiations that can be found in the xen-devel archives we were unable
to find a safe, architecturally correct way of doing that.

I hope this sheds some light on it.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.