[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Locking on vm-event operations (monitor)



On 7/22/2016 3:10 PM, Razvan Cojocaru wrote:
On 07/22/2016 02:39 PM, Corneliu ZUZU wrote:
On 7/22/2016 1:29 PM, Razvan Cojocaru wrote:
On 07/22/2016 01:17 PM, Corneliu ZUZU wrote:
On 7/22/2016 12:55 PM, Razvan Cojocaru wrote:
On 07/22/2016 12:27 PM, Corneliu ZUZU wrote:
Hi,

I've been inspecting vm-event code parts to try and understand when
and
why domain pausing/locking is done. If I understood correctly, domain
pausing is done solely to force all the vCPUs of that domain to see a
configuration update and act upon it (e.g. in the case of a
XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring,
domain pausing/unpausing ensures immediate enabling of CR3
load-exiting
for all vCPUs), not to synchronize concurrent operations
(lock-behavior).

As for locking, I see that for example vm_event_init_domain(),
vm_event_cleanup_domain() and monitor_domctl() are all protected by
the
domctl-lock, but I don't think that's enough.
Here are a few code-paths that led me to believe that:
* do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_*
resources, but it doesn't seem to take the domctl lock, so it seems
possible for that to happen _while_ those resources are
initialized/cleaned-up
* monitor_guest_request() also calls
monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot()
which attempts a vm_event_ring_lock(ved), which could also be called
_while_ that's initialized (vm_event_enable()) or cleaned-up
(vm_event_disable())
* hvm_monitor_cr() - e.g. on the code-path
vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX()


there doesn't seem to be taken into account the possibility of a
concurrent monitor_init_domain()/monitor_cleanup_domain()

Am I missing something with these conclusions?
Your conclusions look correct, but I assume that the reason why this
has
not been addressed in the past is that introspection applications are
expected to be well-behaved.
I wouldn't think that was the rationale (considering that the risk is
crashing the hypervisor as a whole; also see below), I'd rather think
this simply went unnoticed.

Specifically, in a codebase where the
choice between uint64_t and long int matters speed-wise, and where
unlikely()s also matter, an extra lock may be an issue.

The typical flow of an introspection application is:

1. Initialize everything.
2. Subscribe to relevant events.
3. Event processing loop.
4. Unsubscribe from events.
5. Do a last-run of event processing (already queued in the ring
buffer).
6. Uninitialize everything (no events are possible here because of
steps
4-5).
And even if an introspection application behaves as sanely as you say,
the current layout of the code still doesn't guarantee bug-free
behavior. That's because for one, there are in-guest events (that
trigger access of vm-events resources) that are completely asynchronous
in relation to 2-6, for example a HVMOP_guest_request_vm_event
(subsequently calling monitor_guest_request()).
Not true. As part of step 4 d->monitor.guest_request_enabled will become
0, so no sensitive code will run after that:
I wasn't being rigorous with 2-6 above (with that I only meant to say
'there can't be any vm-events before everything is initialized') and it
seems true that at least in the monitor_guest_request() case it might
not be able to produce a hypervisor crash (although there are a number
of ugly inconsistencies on that code path). But nonetheless a hypervisor
crash can _still_ happen even with a _sane_ toolstack user.

Look @ hvm_do_resume():

1. If you, as a toolstack user, get at sane step no. 6: "Uninitialize
everything (no events are possible here because of steps 4-5)."
2. But just before you do that,  a hvm_do_resume() happens on an
arbitrary vCPU of the domain and gets to this point:

     if ( unlikely(v->arch.vm_event) )
     {
         // ---> let's say you are now at this point, just after the
above non-NULL check
         struct monitor_write_data *w = &v->arch.vm_event->write_data;

3. and before proceeding further, the uninitialization sequence
_completes_, i.e. this was done in vm_event_cleanup_domain():

     for_each_vcpu ( d, v )
     {
         xfree(v->arch.vm_event);
         v->arch.vm_event = NULL;
     }

4. and then in hvm_do_resume() this gets to be done:

         struct monitor_write_data *w = &v->arch.vm_event->write_data;

         [...]

         if ( w->do_write.msr )
         {
             hvm_msr_write_intercept(w->msr, w->value, 0);
             w->do_write.msr = 0;
         }

then you'll have a beautiful NULL-pointer access hypervisor crash.
As I've said before, of course I agree with that

Sorry, I thought by your previous statement "I assume that the reason why this has not been addressed in the past is that introspection applications are expected to be well-behaved" you were inferring that with a 'sane' toolstack user, as you defined it, it won't be possible to have a hypervisor crash.

  - isn't that what a
previous patch you've submitted addressed?

I'm not sure what patch you're referring to, I don't remember ever addressing these issues before..

I was merely replying to your
concerns assuming that that issue is addressed (as discussed before),
and assuming that the introspection application behaves gracefully.

Again, I'm all for your proposed changes with the additional locking. I
personally do prefer as many safeguards as possible.

I understand that, and I was saying that without these safe-guards significant system compromise is possible, so I wouldn't call them 'additional', but rather 'missing'. :-)

Thanks,
Razvan

Anyway, I think we might've misunderstood each-other here, so better leave the details for when I send the patch-series. ;-)

Thanks too,
Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.