Xen project Mailing List

Re: [Xen-devel] [PATCH v5 11/28] xsplice: Implement support for applying/reverting/replacing patches.

>>> On 24.03.16 at 21:00, <konrad.wilk@xxxxxxxxxx> wrote: > From: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx> > > Implement support for the apply, revert and replace actions. > > To perform and action on a payload, the hypercall sets up a data > structure to schedule the work. A hook is added in all the > return-to-guest paths to check for work to do and execute it if needed. Looking at the diffstat alone I cannot see this being the case. Perhaps it's just the description here which needs to become more precise. > In this way, patches can be applied with all CPUs idle and without > stacks. The first CPU to do_xsplice() becomes the master and triggers a > reschedule softirq to trigger all the other CPUs to enter do_xsplice() > with no stack. Once all CPUs have rendezvoused, all CPUs disable IRQs > and NMIs are ignored. The system is then quiscient and the master > performs the action. After this, all CPUs enable IRQs and NMIs are > re-enabled. > > Note that it is unsafe to patch do_nmi and the xSplice internal functions. > Patching functions on NMI/MCE path is liable to end in disaster. So what measures are (planned to be) taken to make sure this doesn't happen by accident? > The action to perform is one of: > - APPLY: For each function in the module, store the first 5 bytes of the > old function and replace it with a jump to the new function. > - REVERT: Copy the previously stored bytes into the first 5 bytes of the > old function. > - REPLACE: Revert each applied module and then apply the new module. > > To prevent a deadlock with any other barrier in the system, the master > will wait for up to 30ms before timing out. > Measurements found that the patch application to take about 100 μs on a > 72 CPU system, whether idle or fully loaded. That's for an individual patch I suppose? What if REPLACE has to revert dozens or hundreds of patches? > We also add an BUILD_ON to make sure that the size of the structure > of the payload is not inadvertly changed. > > Lastly we unroll the 'vmap_to_page' on x86 as inside the macro there > is a posibility of a NULL pointer. Hence we unroll it with extra > ASSERTS. Note that asserts on non-debug builds are compiled out hence > the extra checks that will just return (and leak memory). I'm afraid I can't really make sense of this: What does "unroll" mean here? There's no loop involved. And where's that potential for NULL? > --- a/docs/misc/xsplice.markdown > +++ b/docs/misc/xsplice.markdown > @@ -841,7 +841,8 @@ The implementation must also have a mechanism for: > * Be able to lookup in the Xen hypervisor the symbol names of functions > from the ELF payload. > * Be able to patch .rodata, .bss, and .data sections. > * Further safety checks (blacklist of which functions cannot be patched, > check > - the stack, make sure the payload is built with same compiler as > hypervisor). > + the stack, make sure the payload is built with same compiler as > hypervisor, > + and NMI/MCE handlers and do_nmi for right now - until an safe solution is > found). The whole thing doesn't parse anymore for me with the addition, so I can't really conclude what you mean to say here (and hence whether that addresses the earlier question). > @@ -120,6 +121,7 @@ static void idle_loop(void) > (*pm_idle)(); > do_tasklet(); > do_softirq(); > + check_for_xsplice_work(); /* Must be last. */ > } > } > > @@ -136,6 +138,7 @@ void startup_cpu_idle_loop(void) > > static void noreturn continue_idle_domain(struct vcpu *v) > { > + check_for_xsplice_work(); > reset_stack_and_jump(idle_loop); > } The placement here kind of contradicts the comment right above. And anyway - why in both places? And why here and not in common code, e.g. in do_softirq(), or - considering the further addition below - inside reset_stack_and_jump() _after_ having reset the stack (after all by doing it up front you do not really meet your goal of doing the action "without stacks")? > +void arch_xsplice_apply_jmp(struct xsplice_patch_func *func) > +{ > + uint32_t val; The way it's being used below, it clearly should be int32_t. > + uint8_t *old_ptr; > + > + BUILD_BUG_ON(PATCH_INSN_SIZE > sizeof(func->undo)); > + BUILD_BUG_ON(PATCH_INSN_SIZE != (1 + sizeof val)); > + > + old_ptr = (uint8_t *)func->old_addr; (Considering this cast, the "old_addr" member should be unsigned long (or void *), not uint64_t: The latest on ARM32 such would otherwise cause problems.) Also - where is the verification that func->old_size >= PATCH_INSN_SIZE? > +void arch_xsplice_revert_jmp(struct xsplice_patch_func *func) const > --- a/xen/common/xsplice.c > +++ b/xen/common/xsplice.c > @@ -3,6 +3,7 @@ > * > */ > > +#include <xen/cpu.h> > #include <xen/err.h> > #include <xen/guest_access.h> > #include <xen/keyhandler.h> > @@ -11,17 +12,29 @@ > #include <xen/mm.h> > #include <xen/sched.h> > #include <xen/smp.h> > +#include <xen/softirq.h> > #include <xen/spinlock.h> > #include <xen/vmap.h> > +#include <xen/wait.h> > #include <xen/xsplice_elf.h> > #include <xen/xsplice.h> > > #include <asm/event.h> > +#include <asm/nmi.h> Urgh? > #include <public/sysctl.h> > > +/* > + * Protects against payload_list operations and also allows only one > + * caller in schedule_work. > + */ > static DEFINE_SPINLOCK(payload_lock); > static LIST_HEAD(payload_list); > > +/* > + * Patches which have been applied. > + */ Comment style. > +struct xsplice_work > +{ > + atomic_t semaphore; /* Used for rendezvous. First to grab it > will > + do the patching. */ "First to grab it" doesn't seem to make sense for an atomic_t variable. > + atomic_t irq_semaphore; /* Used to signal all IRQs disabled. */ > + uint32_t timeout; /* Timeout to do the operation. */ > + struct payload *data; /* The payload on which to act. */ > + volatile bool_t do_work; /* Signals work to do. */ > + volatile bool_t ready; /* Signals all CPUs synchronized. */ > + uint32_t cmd; /* Action request: XSPLICE_ACTION_* */ > +}; Please can you do without fixed size types when the fixed size isn't really a requirement? > +/* There can be only one outstanding patching action. */ > +static struct xsplice_work xsplice_work; > + > +/* Indicate whether the CPU needs to consult xsplice_work structure. */ > +static DEFINE_PER_CPU(bool_t, work_to_do); Peeking ahead to the uses of this, I can't see why this is needed alongside xsplice_work.do_work. > @@ -296,6 +331,77 @@ static int secure_payload(struct payload *payload, > struct xsplice_elf *elf) > return rc; > } > > +static int check_special_sections(struct payload *payload, > + struct xsplice_elf *elf) const (twice, but the first parameter appears to be unused anyway) > +{ > + unsigned int i; > + static const char *const names[] = { ".xsplice.funcs" }; > + > + for ( i = 0; i < ARRAY_SIZE(names); i++ ) > + { > + struct xsplice_elf_sec *sec; const > + sec = xsplice_elf_sec_by_name(elf, names[i]); > + if ( !sec ) > + { > + printk(XENLOG_ERR "%s%s: %s is missing!\n", > + XSPLICE, elf->name, names[i]); > + return -EINVAL; > + } > + > + if ( !sec->sec->sh_size ) > + return -EINVAL; > + } > + > + return 0; > +} So you check for there being one such section. Is having multiple of them okay / meaningful? > +static int prepare_payload(struct payload *payload, > + struct xsplice_elf *elf) > +{ > + struct xsplice_elf_sec *sec; > + unsigned int i; > + struct xsplice_patch_func *f; const (multiple times, and I'm not going to further make this remark: Please deal with this throughout the series) > + sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs"); > + if ( sec ) Assuming check_special_sections() got invoked before you get here, why the conditional? > + { > + if ( sec->sec->sh_size % sizeof *payload->funcs ) > + { > + dprintk(XENLOG_DEBUG, "%s%s: Wrong size of .xsplice.funcs!\n", > + XSPLICE, elf->name); > + return -EINVAL; > + } > + > + payload->funcs = (struct xsplice_patch_func *)sec->load_addr; > + payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs); Since this repeats (so far I thought this was more like a mistake), and despite being a matter of taste to some degree - may I ask that the canonical sizeof() be used, instead of (sizeof ...) or whatever else variants? > + } > + > + for ( i = 0; i < payload->nfuncs; i++ ) > + { > + unsigned int j; > + > + f = &(payload->funcs[i]); > + > + if ( !f->new_addr || !f->old_addr || !f->old_size || !f->new_size ) Isn't new_size == 0 a particularly easy to deal with case? > @@ -499,6 +614,321 @@ static int xsplice_list(xen_sysctl_xsplice_list_t *list) > return rc ? : idx; > } > > +/* > + * The following functions get the CPUs into an appropriate state and > + * apply (or revert) each of the payload's functions. This is needed > + * for XEN_SYSCTL_XSPLICE_ACTION operation (see xsplice_action). > + */ > + > +static int apply_payload(struct payload *data) > +{ > + unsigned int i; > + > + dprintk(XENLOG_DEBUG, "%s%s: Applying %u functions.\n", XSPLICE, > + data->name, data->nfuncs); > + > + arch_xsplice_patching_enter(); > + > + for ( i = 0; i < data->nfuncs; i++ ) > + arch_xsplice_apply_jmp(&data->funcs[i]); > + > + arch_xsplice_patching_leave(); > + > + list_add_tail(&data->applied_list, &applied_list); > + > + return 0; > +} > + > +/* > + * This function is executed having all other CPUs with no stack (we may > + * have cpu_idle on it) and IRQs disabled. > + */ > +static int revert_payload(struct payload *data) Wouldn't the same comment apply to apply_payload()? I.e. is it useful to have here but not there (i.e. isn't the comment ahead of xsplice_do_action() sufficient)? > +static void xsplice_do_action(void) > +{ > + int rc; > + struct payload *data, *other, *tmp; > + > + data = xsplice_work.data; > + /* Now this function should be the only one on any stack. > + * No need to lock the payload list or applied list. */ Comment style. > + switch ( xsplice_work.cmd ) > + { > + case XSPLICE_ACTION_APPLY: > + rc = apply_payload(data); > + if ( rc == 0 ) > + data->state = XSPLICE_STATE_APPLIED; > + break; > + > + case XSPLICE_ACTION_REVERT: > + rc = revert_payload(data); > + if ( rc == 0 ) > + data->state = XSPLICE_STATE_CHECKED; > + break; > + > + case XSPLICE_ACTION_REPLACE: > + rc = 0; > + /* N.B: Use 'applied_list' member, not 'list'. */ > + list_for_each_entry_safe_reverse ( other, tmp, &applied_list, > applied_list ) > + { > + other->rc = revert_payload(other); Why does this set ->rc, but the two earlier ones only set the local variable? > + if ( other->rc == 0 ) > + other->state = XSPLICE_STATE_CHECKED; > + else > + { > + rc = -EINVAL; > + break; > + } > + } > + > + if ( rc != -EINVAL ) Perhaps better "rc == 0"? > + { > + rc = apply_payload(data); > + if ( rc == 0 ) > + data->state = XSPLICE_STATE_APPLIED; > + } > + break; > + > + default: > + rc = -EINVAL; > + break; > + } > + > + data->rc = rc; Oh, here it is. But why would an xsplice_work.cmd (which probably shouldn't make it here anyway) mark the payload having an error? It didn't change state or anything after all. > +/* > + * MUST be holding the payload_lock. > + */ comment style (another comment I'm not going to repeat any further) > +static int schedule_work(struct payload *data, uint32_t cmd, uint32_t > timeout) > +{ > + unsigned int cpu; > + > + ASSERT(spin_is_locked(&payload_lock)); Also the comment above is clearly redundant with this ASSERT(). > + /* Fail if an operation is already scheduled. */ > + if ( xsplice_work.do_work ) > + return -EBUSY; > + > + if ( !get_cpu_maps() ) > + { > + printk(XENLOG_ERR "%s%s: unable to get cpu_maps lock!\n", > + XSPLICE, data->name); > + return -EBUSY; > + } > + > + xsplice_work.cmd = cmd; > + xsplice_work.data = data; > + xsplice_work.timeout = timeout ?: MILLISECS(30); > + > + dprintk(XENLOG_DEBUG, "%s%s: timeout is %"PRI_stime"ms\n", > + XSPLICE, data->name, xsplice_work.timeout / MILLISECS(1)); > + > + /* > + * Once the patching has been completed, the semaphore value will > + * be num_online_cpus()-1. > + */ Is this comment really useful? I.e. is that final value meaningful for anything? > + atomic_set(&xsplice_work.semaphore, -1); > + atomic_set(&xsplice_work.irq_semaphore, -1); > + > + xsplice_work.ready = 0; > + smp_wmb(); > + xsplice_work.do_work = 1; > + smp_wmb(); > + /* > + * Above smp_wmb() gives us an compiler barrier, as we MUST do this > + * after setting the global structure. > + */ "a compiler barrier" > +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu) > +{ > + /* TODO: Handle missing NMI/MCE.*/ > + return 1; > +} Aren't we in common code here? > +void check_for_xsplice_work(void) > +{ > + unsigned int cpu = smp_processor_id(); > + nmi_callback_t saved_nmi_callback; This again looks very x86-ish. > + s_time_t timeout; > + unsigned long flags; > + > + /* Fast path: no work to do. */ > + if ( !per_cpu(work_to_do, cpu ) ) > + return; > + > + /* In case we aborted, other CPUs can skip right away. */ > + if ( (!xsplice_work.do_work) ) Stray parentheses. > + { > + per_cpu(work_to_do, cpu) = 0; > + return; > + } > + > + ASSERT(local_irq_is_enabled()); > + > + /* Set at -1, so will go up to num_online_cpus - 1. */ > + if ( atomic_inc_and_test(&xsplice_work.semaphore) ) > + { > + struct payload *p; > + unsigned int total_cpus; > + > + p = xsplice_work.data; > + if ( !get_cpu_maps() ) > + { > + printk(XENLOG_ERR "%s%s: CPU%u - unable to get cpu_maps lock!\n", > + XSPLICE, p->name, cpu); > + per_cpu(work_to_do, cpu) = 0; > + xsplice_work.data->rc = -EBUSY; > + xsplice_work.do_work = 0; On x86 such possibly simultaneous accesses may be okay, but is that universally the case? Wouldn't it better be only the monarch which updates shared data? > + /* > + * Do NOT decrement semaphore down - as that may cause the other > + * CPU (which may be at this exact moment checking the ASSERT) Which ASSERT()? (I guess the one a few lines up - but does that matter?) > + * to assume the role of master and then needlessly time out > + * out (as do_work is zero). > + */ > + return; > + } > + > + barrier(); /* MUST do it after get_cpu_maps. */ > + total_cpus = num_online_cpus() - 1; Considering this the variable (and earlier function parameter) is misnamed. > + if ( total_cpus ) > + { > + dprintk(XENLOG_DEBUG, "%s%s: CPU%u - IPIing the %u CPUs\n", Not being a native speaker, I've previously observed too many articles - the "the" here again looks quite odd, or there is an "other" missing. > + XSPLICE, p->name, cpu, total_cpus); > + smp_call_function(reschedule_fn, NULL, 0); > + } > + > + timeout = xsplice_work.timeout + NOW(); > + if ( xsplice_do_wait(&xsplice_work.semaphore, timeout, total_cpus, > + "Timed out on CPU semaphore") ) > + goto abort; > + > + /* "Mask" NMIs. */ > + saved_nmi_callback = set_nmi_callback(mask_nmi_callback); So what if an NMI got raised on a remote CPU right before you set this? I.e. doesn't this need to move earlier? > + /* All CPUs are waiting, now signal to disable IRQs. */ > + xsplice_work.ready = 1; > + smp_wmb(); > + > + atomic_inc(&xsplice_work.irq_semaphore); > + if ( !xsplice_do_wait(&xsplice_work.irq_semaphore, timeout, > total_cpus, > + "Timed out on IRQ semaphore") ) I'd prefer if the common parts of that message moved into xsplice_do_wait() - no reason to have more string literal space occupied than really needed. Also, looking back, the respective function parameter could do with a more descriptive name. And then - does it make sense to wait the perhaps full 30ms on this second step? Rendezvoused CPUs should react rather quickly to the signal to disable interrupts. > + { > + local_irq_save(flags); > + /* Do the patching. */ > + xsplice_do_action(); > + /* To flush out pipeline. */ > + arch_xsplice_post_action(); The comment needs to become more generic, and maybe the function name more specific. > + local_irq_restore(flags); > + } > + set_nmi_callback(saved_nmi_callback); > + > + abort: > + per_cpu(work_to_do, cpu) = 0; > + xsplice_work.do_work = 0; > + > + smp_wmb(); /* Synchronize with waiting CPUs. */ What "synchronization" does this barrier do? > + ASSERT(local_irq_is_enabled()); Is there really anything between the earlier identical ASSERT() and this one which could leave interrupts off? > + put_cpu_maps(); > + > + printk(XENLOG_INFO "%s%s finished with rc=%d\n", XSPLICE, > + p->name, p->rc); And no record left of what was done with that payload? > + } > + else > + { > + /* Wait for all CPUs to rendezvous. */ > + while ( xsplice_work.do_work && !xsplice_work.ready ) > + { > + cpu_relax(); > + smp_rmb(); What is the barrier doing inside this and the other loop below? > @@ -635,15 +1063,30 @@ static void xsplice_printall(unsigned char key) > } > > list_for_each_entry ( data, &payload_list, list ) > + { > printk(" name=%s state=%s(%d) %p (.data=%p, .rodata=%p) using %zu > pages.\n", > data->name, state2str(data->state), data->state, > data->text_addr, > data->rw_addr, data->ro_addr, data->payload_pages); > > + for ( i = 0; i < data->nfuncs; i++ ) > + { > + struct xsplice_patch_func *f = &(data->funcs[i]); > + printk(" %s patch 0x%"PRIx64"(%u) with 0x%"PRIx64"(%u)\n", %# again please. > + f->name, f->old_addr, f->old_size, f->new_addr, > f->new_size); > + if ( !(i % 100) ) Using a power of 2 in situations like this is going to produce both smaller and faster code (the faster aspect may not matter here, but anyway). Also I don't think you want to do this on the first iteration. > + process_pending_softirqs(); With a lock held? > static int __init xsplice_init(void) > { > + BUILD_BUG_ON( sizeof(struct xsplice_patch_func) != 64 ); > + BUILD_BUG_ON( offsetof(struct xsplice_patch_func, new_addr) != 8 ); > + BUILD_BUG_ON( offsetof(struct xsplice_patch_func, new_size) != 24 ); What assumptions get broken if these sizes change? > --- a/xen/include/xen/xsplice.h > +++ b/xen/include/xen/xsplice.h > @@ -11,12 +11,30 @@ struct xsplice_elf_sec; > struct xsplice_elf_sym; > struct xen_sysctl_xsplice_op; > > +#include <xen/elfstructs.h> > +/* > + * The structure which defines the patching. This is what the hypervisor > + * expects in the '.xsplice.func' section of the ELF file. > + * > + * This MUST be in sync with what the tools generate. In which case this need to be part of the public interface, I would say. > + */ > +struct xsplice_patch_func { > + const char *name; > + uint64_t new_addr; > + uint64_t old_addr; > + uint32_t new_size; > + uint32_t old_size; > + uint8_t undo[8]; Shouldn't the size of this array be a per-arch constant? > + uint8_t pad[24]; What is this padding field good for? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.