[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Locking on vm-event operations (monitor)
On 7/22/2016 1:29 PM, Razvan Cojocaru wrote: On 07/22/2016 01:17 PM, Corneliu ZUZU wrote:On 7/22/2016 12:55 PM, Razvan Cojocaru wrote:On 07/22/2016 12:27 PM, Corneliu ZUZU wrote:Hi, I've been inspecting vm-event code parts to try and understand when and why domain pausing/locking is done. If I understood correctly, domain pausing is done solely to force all the vCPUs of that domain to see a configuration update and act upon it (e.g. in the case of a XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, domain pausing/unpausing ensures immediate enabling of CR3 load-exiting for all vCPUs), not to synchronize concurrent operations (lock-behavior). As for locking, I see that for example vm_event_init_domain(), vm_event_cleanup_domain() and monitor_domctl() are all protected by the domctl-lock, but I don't think that's enough. Here are a few code-paths that led me to believe that: * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* resources, but it doesn't seem to take the domctl lock, so it seems possible for that to happen _while_ those resources are initialized/cleaned-up * monitor_guest_request() also calls monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() which attempts a vm_event_ring_lock(ved), which could also be called _while_ that's initialized (vm_event_enable()) or cleaned-up (vm_event_disable()) * hvm_monitor_cr() - e.g. on the code-path vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() there doesn't seem to be taken into account the possibility of a concurrent monitor_init_domain()/monitor_cleanup_domain() Am I missing something with these conclusions?Your conclusions look correct, but I assume that the reason why this has not been addressed in the past is that introspection applications are expected to be well-behaved.I wouldn't think that was the rationale (considering that the risk is crashing the hypervisor as a whole; also see below), I'd rather think this simply went unnoticed.Specifically, in a codebase where the choice between uint64_t and long int matters speed-wise, and where unlikely()s also matter, an extra lock may be an issue. The typical flow of an introspection application is: 1. Initialize everything. 2. Subscribe to relevant events. 3. Event processing loop. 4. Unsubscribe from events. 5. Do a last-run of event processing (already queued in the ring buffer). 6. Uninitialize everything (no events are possible here because of steps 4-5).And even if an introspection application behaves as sanely as you say, the current layout of the code still doesn't guarantee bug-free behavior. That's because for one, there are in-guest events (that trigger access of vm-events resources) that are completely asynchronous in relation to 2-6, for example a HVMOP_guest_request_vm_event (subsequently calling monitor_guest_request()).Not true. As part of step 4 d->monitor.guest_request_enabled will become 0, so no sensitive code will run after that: I wasn't being rigorous with 2-6 above (with that I only meant to say 'there can't be any vm-events before everything is initialized') and it seems true that at least in the monitor_guest_request() case it might not be able to produce a hypervisor crash (although there are a number of ugly inconsistencies on that code path). But nonetheless a hypervisor crash can _still_ happen even with a _sane_ toolstack user. Look @ hvm_do_resume():1. If you, as a toolstack user, get at sane step no. 6: "Uninitialize everything (no events are possible here because of steps 4-5)." 2. But just before you do that, a hvm_do_resume() happens on an arbitrary vCPU of the domain and gets to this point: if ( unlikely(v->arch.vm_event) ) {// ---> let's say you are now at this point, just after the above non-NULL check struct monitor_write_data *w = &v->arch.vm_event->write_data;3. and before proceeding further, the uninitialization sequence _completes_, i.e. this was done in vm_event_cleanup_domain(): for_each_vcpu ( d, v ) { xfree(v->arch.vm_event); v->arch.vm_event = NULL; } 4. and then in hvm_do_resume() this gets to be done: struct monitor_write_data *w = &v->arch.vm_event->write_data; [...] if ( w->do_write.msr ) { hvm_msr_write_intercept(w->msr, w->value, 0); w->do_write.msr = 0; } then you'll have a beautiful NULL-pointer access hypervisor crash. 129 void monitor_guest_request(void) 130 { 131 struct vcpu *curr = current; 132 struct domain *d = curr->domain; 133 134 if ( d->monitor.guest_request_enabled ) 135 { 136 vm_event_request_t req = { 137 .reason = VM_EVENT_REASON_GUEST_REQUEST, 138 .vcpu_id = curr->vcpu_id, 139 }; 140 141 monitor_traps(curr, d->monitor.guest_request_sync, &req); 142 } 143 } The guest _may_ try to request an event after that, but the request will be ignored., and that "subsys_lock" sounds obscure to me, although I admit that I can't think of a good name at the moment.I'm referring to monitor, share & paging as "subsystems" of the whole "vm-events" subsystem (in that sense the 3 aforementioned are "sub-subsystems"). To me it's acceptable, but I'm open to better names.If Tamas agrees with you I'll consider myself outvoted. Thanks, Razvan Corneliu. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |