|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 11/28] xsplice: Implement support for applying/reverting/replacing patches.
>>> On 24.03.16 at 21:00, <konrad.wilk@xxxxxxxxxx> wrote:
> From: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>
>
> Implement support for the apply, revert and replace actions.
>
> To perform and action on a payload, the hypercall sets up a data
> structure to schedule the work. A hook is added in all the
> return-to-guest paths to check for work to do and execute it if needed.
Looking at the diffstat alone I cannot see this being the case.
Perhaps it's just the description here which needs to become
more precise.
> In this way, patches can be applied with all CPUs idle and without
> stacks. The first CPU to do_xsplice() becomes the master and triggers a
> reschedule softirq to trigger all the other CPUs to enter do_xsplice()
> with no stack. Once all CPUs have rendezvoused, all CPUs disable IRQs
> and NMIs are ignored. The system is then quiscient and the master
> performs the action. After this, all CPUs enable IRQs and NMIs are
> re-enabled.
>
> Note that it is unsafe to patch do_nmi and the xSplice internal functions.
> Patching functions on NMI/MCE path is liable to end in disaster.
So what measures are (planned to be) taken to make sure this
doesn't happen by accident?
> The action to perform is one of:
> - APPLY: For each function in the module, store the first 5 bytes of the
> old function and replace it with a jump to the new function.
> - REVERT: Copy the previously stored bytes into the first 5 bytes of the
> old function.
> - REPLACE: Revert each applied module and then apply the new module.
>
> To prevent a deadlock with any other barrier in the system, the master
> will wait for up to 30ms before timing out.
> Measurements found that the patch application to take about 100 μs on a
> 72 CPU system, whether idle or fully loaded.
That's for an individual patch I suppose? What if REPLACE has to
revert dozens or hundreds of patches?
> We also add an BUILD_ON to make sure that the size of the structure
> of the payload is not inadvertly changed.
>
> Lastly we unroll the 'vmap_to_page' on x86 as inside the macro there
> is a posibility of a NULL pointer. Hence we unroll it with extra
> ASSERTS. Note that asserts on non-debug builds are compiled out hence
> the extra checks that will just return (and leak memory).
I'm afraid I can't really make sense of this: What does "unroll" mean
here? There's no loop involved. And where's that potential for NULL?
> --- a/docs/misc/xsplice.markdown
> +++ b/docs/misc/xsplice.markdown
> @@ -841,7 +841,8 @@ The implementation must also have a mechanism for:
> * Be able to lookup in the Xen hypervisor the symbol names of functions
> from the ELF payload.
> * Be able to patch .rodata, .bss, and .data sections.
> * Further safety checks (blacklist of which functions cannot be patched,
> check
> - the stack, make sure the payload is built with same compiler as
> hypervisor).
> + the stack, make sure the payload is built with same compiler as
> hypervisor,
> + and NMI/MCE handlers and do_nmi for right now - until an safe solution is
> found).
The whole thing doesn't parse anymore for me with the addition,
so I can't really conclude what you mean to say here (and hence
whether that addresses the earlier question).
> @@ -120,6 +121,7 @@ static void idle_loop(void)
> (*pm_idle)();
> do_tasklet();
> do_softirq();
> + check_for_xsplice_work(); /* Must be last. */
> }
> }
>
> @@ -136,6 +138,7 @@ void startup_cpu_idle_loop(void)
>
> static void noreturn continue_idle_domain(struct vcpu *v)
> {
> + check_for_xsplice_work();
> reset_stack_and_jump(idle_loop);
> }
The placement here kind of contradicts the comment right above.
And anyway - why in both places? And why here and not in
common code, e.g. in do_softirq(), or - considering the further
addition below - inside reset_stack_and_jump() _after_ having
reset the stack (after all by doing it up front you do not really
meet your goal of doing the action "without stacks")?
> +void arch_xsplice_apply_jmp(struct xsplice_patch_func *func)
> +{
> + uint32_t val;
The way it's being used below, it clearly should be int32_t.
> + uint8_t *old_ptr;
> +
> + BUILD_BUG_ON(PATCH_INSN_SIZE > sizeof(func->undo));
> + BUILD_BUG_ON(PATCH_INSN_SIZE != (1 + sizeof val));
> +
> + old_ptr = (uint8_t *)func->old_addr;
(Considering this cast, the "old_addr" member should be
unsigned long (or void *), not uint64_t: The latest on ARM32
such would otherwise cause problems.)
Also - where is the verification that
func->old_size >= PATCH_INSN_SIZE?
> +void arch_xsplice_revert_jmp(struct xsplice_patch_func *func)
const
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -3,6 +3,7 @@
> *
> */
>
> +#include <xen/cpu.h>
> #include <xen/err.h>
> #include <xen/guest_access.h>
> #include <xen/keyhandler.h>
> @@ -11,17 +12,29 @@
> #include <xen/mm.h>
> #include <xen/sched.h>
> #include <xen/smp.h>
> +#include <xen/softirq.h>
> #include <xen/spinlock.h>
> #include <xen/vmap.h>
> +#include <xen/wait.h>
> #include <xen/xsplice_elf.h>
> #include <xen/xsplice.h>
>
> #include <asm/event.h>
> +#include <asm/nmi.h>
Urgh?
> #include <public/sysctl.h>
>
> +/*
> + * Protects against payload_list operations and also allows only one
> + * caller in schedule_work.
> + */
> static DEFINE_SPINLOCK(payload_lock);
> static LIST_HEAD(payload_list);
>
> +/*
> + * Patches which have been applied.
> + */
Comment style.
> +struct xsplice_work
> +{
> + atomic_t semaphore; /* Used for rendezvous. First to grab it
> will
> + do the patching. */
"First to grab it" doesn't seem to make sense for an atomic_t variable.
> + atomic_t irq_semaphore; /* Used to signal all IRQs disabled. */
> + uint32_t timeout; /* Timeout to do the operation. */
> + struct payload *data; /* The payload on which to act. */
> + volatile bool_t do_work; /* Signals work to do. */
> + volatile bool_t ready; /* Signals all CPUs synchronized. */
> + uint32_t cmd; /* Action request: XSPLICE_ACTION_* */
> +};
Please can you do without fixed size types when the fixed size
isn't really a requirement?
> +/* There can be only one outstanding patching action. */
> +static struct xsplice_work xsplice_work;
> +
> +/* Indicate whether the CPU needs to consult xsplice_work structure. */
> +static DEFINE_PER_CPU(bool_t, work_to_do);
Peeking ahead to the uses of this, I can't see why this is needed
alongside xsplice_work.do_work.
> @@ -296,6 +331,77 @@ static int secure_payload(struct payload *payload,
> struct xsplice_elf *elf)
> return rc;
> }
>
> +static int check_special_sections(struct payload *payload,
> + struct xsplice_elf *elf)
const (twice, but the first parameter appears to be unused anyway)
> +{
> + unsigned int i;
> + static const char *const names[] = { ".xsplice.funcs" };
> +
> + for ( i = 0; i < ARRAY_SIZE(names); i++ )
> + {
> + struct xsplice_elf_sec *sec;
const
> + sec = xsplice_elf_sec_by_name(elf, names[i]);
> + if ( !sec )
> + {
> + printk(XENLOG_ERR "%s%s: %s is missing!\n",
> + XSPLICE, elf->name, names[i]);
> + return -EINVAL;
> + }
> +
> + if ( !sec->sec->sh_size )
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
So you check for there being one such section. Is having multiple
of them okay / meaningful?
> +static int prepare_payload(struct payload *payload,
> + struct xsplice_elf *elf)
> +{
> + struct xsplice_elf_sec *sec;
> + unsigned int i;
> + struct xsplice_patch_func *f;
const (multiple times, and I'm not going to further make this remark:
Please deal with this throughout the series)
> + sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
> + if ( sec )
Assuming check_special_sections() got invoked before you get here,
why the conditional?
> + {
> + if ( sec->sec->sh_size % sizeof *payload->funcs )
> + {
> + dprintk(XENLOG_DEBUG, "%s%s: Wrong size of .xsplice.funcs!\n",
> + XSPLICE, elf->name);
> + return -EINVAL;
> + }
> +
> + payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
> + payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);
Since this repeats (so far I thought this was more like a mistake),
and despite being a matter of taste to some degree - may I ask
that the canonical sizeof() be used, instead of (sizeof ...) or
whatever else variants?
> + }
> +
> + for ( i = 0; i < payload->nfuncs; i++ )
> + {
> + unsigned int j;
> +
> + f = &(payload->funcs[i]);
> +
> + if ( !f->new_addr || !f->old_addr || !f->old_size || !f->new_size )
Isn't new_size == 0 a particularly easy to deal with case?
> @@ -499,6 +614,321 @@ static int xsplice_list(xen_sysctl_xsplice_list_t *list)
> return rc ? : idx;
> }
>
> +/*
> + * The following functions get the CPUs into an appropriate state and
> + * apply (or revert) each of the payload's functions. This is needed
> + * for XEN_SYSCTL_XSPLICE_ACTION operation (see xsplice_action).
> + */
> +
> +static int apply_payload(struct payload *data)
> +{
> + unsigned int i;
> +
> + dprintk(XENLOG_DEBUG, "%s%s: Applying %u functions.\n", XSPLICE,
> + data->name, data->nfuncs);
> +
> + arch_xsplice_patching_enter();
> +
> + for ( i = 0; i < data->nfuncs; i++ )
> + arch_xsplice_apply_jmp(&data->funcs[i]);
> +
> + arch_xsplice_patching_leave();
> +
> + list_add_tail(&data->applied_list, &applied_list);
> +
> + return 0;
> +}
> +
> +/*
> + * This function is executed having all other CPUs with no stack (we may
> + * have cpu_idle on it) and IRQs disabled.
> + */
> +static int revert_payload(struct payload *data)
Wouldn't the same comment apply to apply_payload()? I.e. is it
useful to have here but not there (i.e. isn't the comment ahead
of xsplice_do_action() sufficient)?
> +static void xsplice_do_action(void)
> +{
> + int rc;
> + struct payload *data, *other, *tmp;
> +
> + data = xsplice_work.data;
> + /* Now this function should be the only one on any stack.
> + * No need to lock the payload list or applied list. */
Comment style.
> + switch ( xsplice_work.cmd )
> + {
> + case XSPLICE_ACTION_APPLY:
> + rc = apply_payload(data);
> + if ( rc == 0 )
> + data->state = XSPLICE_STATE_APPLIED;
> + break;
> +
> + case XSPLICE_ACTION_REVERT:
> + rc = revert_payload(data);
> + if ( rc == 0 )
> + data->state = XSPLICE_STATE_CHECKED;
> + break;
> +
> + case XSPLICE_ACTION_REPLACE:
> + rc = 0;
> + /* N.B: Use 'applied_list' member, not 'list'. */
> + list_for_each_entry_safe_reverse ( other, tmp, &applied_list,
> applied_list )
> + {
> + other->rc = revert_payload(other);
Why does this set ->rc, but the two earlier ones only set the local
variable?
> + if ( other->rc == 0 )
> + other->state = XSPLICE_STATE_CHECKED;
> + else
> + {
> + rc = -EINVAL;
> + break;
> + }
> + }
> +
> + if ( rc != -EINVAL )
Perhaps better "rc == 0"?
> + {
> + rc = apply_payload(data);
> + if ( rc == 0 )
> + data->state = XSPLICE_STATE_APPLIED;
> + }
> + break;
> +
> + default:
> + rc = -EINVAL;
> + break;
> + }
> +
> + data->rc = rc;
Oh, here it is. But why would an xsplice_work.cmd (which probably
shouldn't make it here anyway) mark the payload having an error?
It didn't change state or anything after all.
> +/*
> + * MUST be holding the payload_lock.
> + */
comment style (another comment I'm not going to repeat any further)
> +static int schedule_work(struct payload *data, uint32_t cmd, uint32_t
> timeout)
> +{
> + unsigned int cpu;
> +
> + ASSERT(spin_is_locked(&payload_lock));
Also the comment above is clearly redundant with this ASSERT().
> + /* Fail if an operation is already scheduled. */
> + if ( xsplice_work.do_work )
> + return -EBUSY;
> +
> + if ( !get_cpu_maps() )
> + {
> + printk(XENLOG_ERR "%s%s: unable to get cpu_maps lock!\n",
> + XSPLICE, data->name);
> + return -EBUSY;
> + }
> +
> + xsplice_work.cmd = cmd;
> + xsplice_work.data = data;
> + xsplice_work.timeout = timeout ?: MILLISECS(30);
> +
> + dprintk(XENLOG_DEBUG, "%s%s: timeout is %"PRI_stime"ms\n",
> + XSPLICE, data->name, xsplice_work.timeout / MILLISECS(1));
> +
> + /*
> + * Once the patching has been completed, the semaphore value will
> + * be num_online_cpus()-1.
> + */
Is this comment really useful? I.e. is that final value meaningful
for anything?
> + atomic_set(&xsplice_work.semaphore, -1);
> + atomic_set(&xsplice_work.irq_semaphore, -1);
> +
> + xsplice_work.ready = 0;
> + smp_wmb();
> + xsplice_work.do_work = 1;
> + smp_wmb();
> + /*
> + * Above smp_wmb() gives us an compiler barrier, as we MUST do this
> + * after setting the global structure.
> + */
"a compiler barrier"
> +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
> +{
> + /* TODO: Handle missing NMI/MCE.*/
> + return 1;
> +}
Aren't we in common code here?
> +void check_for_xsplice_work(void)
> +{
> + unsigned int cpu = smp_processor_id();
> + nmi_callback_t saved_nmi_callback;
This again looks very x86-ish.
> + s_time_t timeout;
> + unsigned long flags;
> +
> + /* Fast path: no work to do. */
> + if ( !per_cpu(work_to_do, cpu ) )
> + return;
> +
> + /* In case we aborted, other CPUs can skip right away. */
> + if ( (!xsplice_work.do_work) )
Stray parentheses.
> + {
> + per_cpu(work_to_do, cpu) = 0;
> + return;
> + }
> +
> + ASSERT(local_irq_is_enabled());
> +
> + /* Set at -1, so will go up to num_online_cpus - 1. */
> + if ( atomic_inc_and_test(&xsplice_work.semaphore) )
> + {
> + struct payload *p;
> + unsigned int total_cpus;
> +
> + p = xsplice_work.data;
> + if ( !get_cpu_maps() )
> + {
> + printk(XENLOG_ERR "%s%s: CPU%u - unable to get cpu_maps lock!\n",
> + XSPLICE, p->name, cpu);
> + per_cpu(work_to_do, cpu) = 0;
> + xsplice_work.data->rc = -EBUSY;
> + xsplice_work.do_work = 0;
On x86 such possibly simultaneous accesses may be okay, but is
that universally the case? Wouldn't it better be only the monarch
which updates shared data?
> + /*
> + * Do NOT decrement semaphore down - as that may cause the other
> + * CPU (which may be at this exact moment checking the ASSERT)
Which ASSERT()? (I guess the one a few lines up - but does that
matter?)
> + * to assume the role of master and then needlessly time out
> + * out (as do_work is zero).
> + */
> + return;
> + }
> +
> + barrier(); /* MUST do it after get_cpu_maps. */
> + total_cpus = num_online_cpus() - 1;
Considering this the variable (and earlier function parameter) is
misnamed.
> + if ( total_cpus )
> + {
> + dprintk(XENLOG_DEBUG, "%s%s: CPU%u - IPIing the %u CPUs\n",
Not being a native speaker, I've previously observed too many
articles - the "the" here again looks quite odd, or there is an
"other" missing.
> + XSPLICE, p->name, cpu, total_cpus);
> + smp_call_function(reschedule_fn, NULL, 0);
> + }
> +
> + timeout = xsplice_work.timeout + NOW();
> + if ( xsplice_do_wait(&xsplice_work.semaphore, timeout, total_cpus,
> + "Timed out on CPU semaphore") )
> + goto abort;
> +
> + /* "Mask" NMIs. */
> + saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
So what if an NMI got raised on a remote CPU right before you set
this? I.e. doesn't this need to move earlier?
> + /* All CPUs are waiting, now signal to disable IRQs. */
> + xsplice_work.ready = 1;
> + smp_wmb();
> +
> + atomic_inc(&xsplice_work.irq_semaphore);
> + if ( !xsplice_do_wait(&xsplice_work.irq_semaphore, timeout,
> total_cpus,
> + "Timed out on IRQ semaphore") )
I'd prefer if the common parts of that message moved into
xsplice_do_wait() - no reason to have more string literal space
occupied than really needed. Also, looking back, the respective
function parameter could do with a more descriptive name.
And then - does it make sense to wait the perhaps full 30ms
on this second step? Rendezvoused CPUs should react rather
quickly to the signal to disable interrupts.
> + {
> + local_irq_save(flags);
> + /* Do the patching. */
> + xsplice_do_action();
> + /* To flush out pipeline. */
> + arch_xsplice_post_action();
The comment needs to become more generic, and maybe the
function name more specific.
> + local_irq_restore(flags);
> + }
> + set_nmi_callback(saved_nmi_callback);
> +
> + abort:
> + per_cpu(work_to_do, cpu) = 0;
> + xsplice_work.do_work = 0;
> +
> + smp_wmb(); /* Synchronize with waiting CPUs. */
What "synchronization" does this barrier do?
> + ASSERT(local_irq_is_enabled());
Is there really anything between the earlier identical ASSERT() and
this one which could leave interrupts off?
> + put_cpu_maps();
> +
> + printk(XENLOG_INFO "%s%s finished with rc=%d\n", XSPLICE,
> + p->name, p->rc);
And no record left of what was done with that payload?
> + }
> + else
> + {
> + /* Wait for all CPUs to rendezvous. */
> + while ( xsplice_work.do_work && !xsplice_work.ready )
> + {
> + cpu_relax();
> + smp_rmb();
What is the barrier doing inside this and the other loop below?
> @@ -635,15 +1063,30 @@ static void xsplice_printall(unsigned char key)
> }
>
> list_for_each_entry ( data, &payload_list, list )
> + {
> printk(" name=%s state=%s(%d) %p (.data=%p, .rodata=%p) using %zu
> pages.\n",
> data->name, state2str(data->state), data->state,
> data->text_addr,
> data->rw_addr, data->ro_addr, data->payload_pages);
>
> + for ( i = 0; i < data->nfuncs; i++ )
> + {
> + struct xsplice_patch_func *f = &(data->funcs[i]);
> + printk(" %s patch 0x%"PRIx64"(%u) with 0x%"PRIx64"(%u)\n",
%# again please.
> + f->name, f->old_addr, f->old_size, f->new_addr,
> f->new_size);
> + if ( !(i % 100) )
Using a power of 2 in situations like this is going to produce both
smaller and faster code (the faster aspect may not matter here,
but anyway). Also I don't think you want to do this on the first
iteration.
> + process_pending_softirqs();
With a lock held?
> static int __init xsplice_init(void)
> {
> + BUILD_BUG_ON( sizeof(struct xsplice_patch_func) != 64 );
> + BUILD_BUG_ON( offsetof(struct xsplice_patch_func, new_addr) != 8 );
> + BUILD_BUG_ON( offsetof(struct xsplice_patch_func, new_size) != 24 );
What assumptions get broken if these sizes change?
> --- a/xen/include/xen/xsplice.h
> +++ b/xen/include/xen/xsplice.h
> @@ -11,12 +11,30 @@ struct xsplice_elf_sec;
> struct xsplice_elf_sym;
> struct xen_sysctl_xsplice_op;
>
> +#include <xen/elfstructs.h>
> +/*
> + * The structure which defines the patching. This is what the hypervisor
> + * expects in the '.xsplice.func' section of the ELF file.
> + *
> + * This MUST be in sync with what the tools generate.
In which case this need to be part of the public interface, I would
say.
> + */
> +struct xsplice_patch_func {
> + const char *name;
> + uint64_t new_addr;
> + uint64_t old_addr;
> + uint32_t new_size;
> + uint32_t old_size;
> + uint8_t undo[8];
Shouldn't the size of this array be a per-arch constant?
> + uint8_t pad[24];
What is this padding field good for?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |