[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Locking on vm-event operations (monitor)
On 7/22/2016 3:10 PM, Razvan Cojocaru wrote: On 07/22/2016 02:39 PM, Corneliu ZUZU wrote:On 7/22/2016 1:29 PM, Razvan Cojocaru wrote:On 07/22/2016 01:17 PM, Corneliu ZUZU wrote:On 7/22/2016 12:55 PM, Razvan Cojocaru wrote:On 07/22/2016 12:27 PM, Corneliu ZUZU wrote:Hi, I've been inspecting vm-event code parts to try and understand when and why domain pausing/locking is done. If I understood correctly, domain pausing is done solely to force all the vCPUs of that domain to see a configuration update and act upon it (e.g. in the case of a XEN_DOMCTL_MONITOR_EVENT_WRITE_CTRLREG that enables CR3 monitoring, domain pausing/unpausing ensures immediate enabling of CR3 load-exiting for all vCPUs), not to synchronize concurrent operations (lock-behavior). As for locking, I see that for example vm_event_init_domain(), vm_event_cleanup_domain() and monitor_domctl() are all protected by the domctl-lock, but I don't think that's enough. Here are a few code-paths that led me to believe that: * do_hvm_op()->monitor_guest_request() reads d.monitor.guest_request_* resources, but it doesn't seem to take the domctl lock, so it seems possible for that to happen _while_ those resources are initialized/cleaned-up * monitor_guest_request() also calls monitor_traps()->...->vm_event_wait_slot()->...->vm_event_grab_slot() which attempts a vm_event_ring_lock(ved), which could also be called _while_ that's initialized (vm_event_enable()) or cleaned-up (vm_event_disable()) * hvm_monitor_cr() - e.g. on the code-path vmx_vmexit_handler(EXIT_REASON_CR_ACCESS)->vmx_cr_access(VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR)->hvm_mov_to_cr()->hvm_set_cr0()->hvm_monitor_crX() there doesn't seem to be taken into account the possibility of a concurrent monitor_init_domain()/monitor_cleanup_domain() Am I missing something with these conclusions?Your conclusions look correct, but I assume that the reason why this has not been addressed in the past is that introspection applications are expected to be well-behaved.I wouldn't think that was the rationale (considering that the risk is crashing the hypervisor as a whole; also see below), I'd rather think this simply went unnoticed.Specifically, in a codebase where the choice between uint64_t and long int matters speed-wise, and where unlikely()s also matter, an extra lock may be an issue. The typical flow of an introspection application is: 1. Initialize everything. 2. Subscribe to relevant events. 3. Event processing loop. 4. Unsubscribe from events. 5. Do a last-run of event processing (already queued in the ring buffer). 6. Uninitialize everything (no events are possible here because of steps 4-5).And even if an introspection application behaves as sanely as you say, the current layout of the code still doesn't guarantee bug-free behavior. That's because for one, there are in-guest events (that trigger access of vm-events resources) that are completely asynchronous in relation to 2-6, for example a HVMOP_guest_request_vm_event (subsequently calling monitor_guest_request()).Not true. As part of step 4 d->monitor.guest_request_enabled will become 0, so no sensitive code will run after that:I wasn't being rigorous with 2-6 above (with that I only meant to say 'there can't be any vm-events before everything is initialized') and it seems true that at least in the monitor_guest_request() case it might not be able to produce a hypervisor crash (although there are a number of ugly inconsistencies on that code path). But nonetheless a hypervisor crash can _still_ happen even with a _sane_ toolstack user. Look @ hvm_do_resume(): 1. If you, as a toolstack user, get at sane step no. 6: "Uninitialize everything (no events are possible here because of steps 4-5)." 2. But just before you do that, a hvm_do_resume() happens on an arbitrary vCPU of the domain and gets to this point: if ( unlikely(v->arch.vm_event) ) { // ---> let's say you are now at this point, just after the above non-NULL check struct monitor_write_data *w = &v->arch.vm_event->write_data; 3. and before proceeding further, the uninitialization sequence _completes_, i.e. this was done in vm_event_cleanup_domain(): for_each_vcpu ( d, v ) { xfree(v->arch.vm_event); v->arch.vm_event = NULL; } 4. and then in hvm_do_resume() this gets to be done: struct monitor_write_data *w = &v->arch.vm_event->write_data; [...] if ( w->do_write.msr ) { hvm_msr_write_intercept(w->msr, w->value, 0); w->do_write.msr = 0; } then you'll have a beautiful NULL-pointer access hypervisor crash.As I've said before, of course I agree with that Sorry, I thought by your previous statement "I assume that the reason why this has not been addressed in the past is that introspection applications are expected to be well-behaved" you were inferring that with a 'sane' toolstack user, as you defined it, it won't be possible to have a hypervisor crash. - isn't that what a previous patch you've submitted addressed? I'm not sure what patch you're referring to, I don't remember ever addressing these issues before.. I was merely replying to your concerns assuming that that issue is addressed (as discussed before), and assuming that the introspection application behaves gracefully. Again, I'm all for your proposed changes with the additional locking. I personally do prefer as many safeguards as possible. I understand that, and I was saying that without these safe-guards significant system compromise is possible, so I wouldn't call them 'additional', but rather 'missing'. :-) Thanks, Razvan Anyway, I think we might've misunderstood each-other here, so better leave the details for when I send the patch-series. ;-) Thanks too, Corneliu. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |