[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 3/7] x86/vm-event/monitor: don't compromise monitor_write_data on domain cleanup



On 7/5/2016 8:16 PM, Razvan Cojocaru wrote:
On 07/05/16 17:28, Corneliu ZUZU wrote:
The arch_vm_event structure is dynamically allocated and freed @
vm_event_cleanup_domain. This cleanup is triggered e.g. when the toolstack user
disables domain monitoring (xc_monitor_disable), which in turn effectively
discards any information that was in arch_vm_event.write_data.

But this can yield unexpected behavior since if a CR-write was awaiting to be
committed on the scheduling tail (hvm_do_resume->arch_monitor_write_data)
before xc_monitor_disable is called, then the domain CR write is wrongfully
ignored, which of course, in these cases, can easily render a domain crash.

To fix the issue, this patch makes arch_vm_event.emul_read_data dynamically
allocated and only frees that in vm_event_cleanup_domain, instead of the whole
arch_vcpu.vm_event structure, which with this patch will only be freed on
vcpu/domain destroyal.

Signed-off-by: Corneliu ZUZU <czuzu@xxxxxxxxxxxxxxx>
Acked-by: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>
---
Changed since v1:
   * arch_vcpu.vm_event made pointer again to avoid eating memory from arch_vcpu
     structure
I believe that all acks should be presumed lost on non-trivial changes
in a new version (which I believe this qualifies as being, with all the
new logic of allocating / deallocating only part of vm_event).

Unfortunately I'm out of office until early next week so I can't
properly test / thoroughly parse this until then, but we should be extra
careful that there are several places in the code where it is assumed
that v->arch.vm_event != NULL is the same thing as monitoring being
enabled. I'm not saying that they're not treated in this patch (the
proper change has certainly been made in emulate.c), I'm just saying
that we should be careful that they are.

Having said that, I propose a special macro to make this all clearer,
something like:

#define is_monitor_enabled_for_vcpu(v) \
     ( v->arch.vm_event && v->arch.vm_event->emul_read_data )

or equivalent inline functions returning a bool_t. Just a thought.


Thanks,
Razvan

At some point I actually defined that exact macro while constructing this patch, but I decided to delete it afterwards because I thought the code would still be clear without it (i.e. only check emul_read_data when we actually need _that_ to be non-NULL) and because it seemed a bit strange, being possible to have _arch_vm_event allocated_ but having the monitor vm-events subsystem _uninitialized_ (because of .write_data being treated specially). Since the write_data field is also part of the monitor subsystem, conceptually one could say the monitor subsystem is at least _partially_ initialized when that's non-NULL, not uninitialized at all. I'll think of a resolution around this, but in the meantime I there's something more important to settle:

I only now notice, it seems to me that the ASSERT(v->arch.vm_event) @ hvm_set_cr0 & such (the one just before calling hvm_monitor_crX) might fail. That's because from what I see in the code the following can happen:
- v.arch.vm_event = NULL (vm-event _not_ initialized)
- toolstack user calls e.g. xc_moLRnitor_write_ctrlreg(xch, domid, VM_EVENT_X86_CR0, true, true, false) -> do_domctl(XEN_DOMCTL_monitor_op) -> monitor_domctl(XEN_DOMCTL_MONITOR_OP_ENABLE) -> arch_monitor_domctl_event(XEN_DOMCTL_MONITOR_EVENT_WRITE_CTREG) which in turn sets the CR0 bit of d.arch.monitor.write_ctrlreg_enabled _without ever checking if v.arch.vm_event is non-NULL_ - afterwards a CR0 write happens on one of the domain vCPUs and we arrive at that assert without having v.arch.vm_event allocated!

I can't test this @ the moment, am I missing something here? I remember taking a look on this code path at some point and idk how I didn't notice something like this, it seems obviously wrong..has something changed recently? I think we need to take a second look over this code-path, I'm also seeing that the proper check _is done_ @ arch_monitor_domctl_op(XEN_DOMCTL_MONITOR_OP_EMULATE_EACH_REP) (though I think a more proper error return value there would be ENODEV instead of EINVAL).

Corneliu.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.