[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC V9 4/5] xen, libxc: Request page fault injection via libxc

On 09/01/2014 03:05 PM, Jan Beulich wrote:
>>>> On 01.09.14 at 13:54, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>> On 09/01/2014 12:08 PM, Jan Beulich wrote:
>>>>>> On 01.09.14 at 09:36, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>>> On 08/29/2014 12:27 PM, Jan Beulich wrote:
>>>>>>>> On 29.08.14 at 09:44, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>>>>> I do understand the preference for a VCPU-based mechanism from a
>>>>>> concurrency point of view, but that would simply potentially fail for
>>>>>> us, hence defeating the purpose of the patch. I'm also not sure how that
>>>>>> would be useful in the general case either, since the same problem that
>>>>>> applies to us would seem to apply to the general case as well.
>>>>> Yeah, the whole thing probably needs a bit more thinking so that the
>>>>> interface doesn't end up being a BitDefender-special. Indeed together
>>>>> with the address space qualification, the interface might not be very
>>>>> useful when made vCPU-bound. And taking it a little further into the
>>>>> "generic" direction, allowing this to only inject #PF doesn't make a
>>>>> very nice interface either. Plus we already have HVMOP_inject_trap,
>>>>> i.e. your first line of thinking (and eventual explaining as the
>>>>> motivation for a patch) should be why that can't be used.
>>>> I'd say that it's memory-introspection specific rather than 3rd-party
>>>> vendor specific. Without this this patch, memory-introspection support
>>>> in general is impacted / less flexible, since there's no other way to
>>>> bring swapped out pages back in.
>>>> For all the reasons you've explained (at least as far as I understand
>>>> it) there's not much room to go more generic - so maybe just renaming
>>>> the libxc wrapper to something more specific (
>>>> xc_domain_request_usermode_pagefault?) is the solution here?
>>> Maybe, but only after you explained why the existing interface can
>>> neither be used nor suitably extended.
>> As far as I understand the HVMOP_inject_trap interface, it is simply (in
>> this case) a way to trigger the equivalent of hvm_inject_page_fault()
>> from userspace (via libxc).
>> We need two additional checks:
>> 1. That CR3 matches, because the way the application works, we need to
>> inject a page fault related to the address space of whatever process is
>> interesting at the time, and
>> 2. That we're in usermode (the CPL check), because we know that it's
>> safe to inject a page fault when we're in user mode, but it's not always
>> safe to do so in kernel mode.
>> The current interface seems to be a low-level, basic wrapper around
>> hvm_inject_trap().
>> What we're trying to do is ask for a page fault when we're both A) in
>> usermode, and B) when a target process matches - and are only interested
>> in page faults, no other trap vector.
>> Technically, the checks could, probably, be moved into (and new
>> parameters added to) hvm_inject_trap() & friends, but it seems unlikely
>> that users other than memory introspection applications would be
>> interested in them, while they would possibly add to the complexity of
>> the interface. The rest of the clients would have to carry dummy
>> parameters around to use it.
> I'm not sure: Injecting faults with certain restrictions (like in your
> case CPL and CR3) does seem to make quite some sense in
> general when the fault isn't intended to be terminal (of a process
> or the entire VM). It's just that so far we used fault injection only
> to indicate error conditions.
> But as always - let's see if others are of differing opinion before
> settling on either route.

While waiting for other opinions, I've started looking at the
HVMOP_inject_trap code, and it uses a per-VCPU data-saving model similar
to the one we've been discussing (and agreed that does not work for the
explained use-cases).

The first giveaway is the libxc function:

 * Injects a hardware/software CPU trap, to take effect the next time
the HVM
 * resumes.
int xc_hvm_inject_trap(
    xc_interface *xch, domid_t dom, int vcpu, uint32_t vector,
    uint32_t type, uint32_t error_code, uint32_t insn_len,
    uint64_t cr2);

which takes an "int vcpu" parameter. Then, this happens in

    case HVMOP_inject_trap:
        xen_hvm_inject_trap_t tr;
        struct domain *d;
        struct vcpu *v;

        if ( copy_from_guest(&tr, arg, 1 ) )
            return -EFAULT;

        rc = rcu_lock_remote_domain_by_id(tr.domid, &d);
        if ( rc != 0 )
            return rc;

        rc = -EINVAL;
        if ( !is_hvm_domain(d) )
            goto param_fail8;

        rc = xsm_hvm_control(XSM_DM_PRIV, d, op);
        if ( rc )
            goto param_fail8;

        rc = -ENOENT;
        if ( tr.vcpuid >= d->max_vcpus || (v = d->vcpu[tr.vcpuid]) == NULL )
            goto param_fail8;

        if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
            rc = -EBUSY;
            v->arch.hvm_vcpu.inject_trap.vector = tr.vector;
            v->arch.hvm_vcpu.inject_trap.type = tr.type;
            v->arch.hvm_vcpu.inject_trap.error_code = tr.error_code;
            v->arch.hvm_vcpu.inject_trap.insn_len = tr.insn_len;
            v->arch.hvm_vcpu.inject_trap.cr2 = tr.cr2;
            rc = 0;


The "v->arch.hvm_vcpu.inject_trap.<member> = initializer;" code again
points to a per-VCPU implementation.

And finally, hvm_do_resume() is also VCPU-dependent:

void hvm_do_resume(struct vcpu *v)
    struct domain *d = v->domain;
    struct hvm_ioreq_server *s;


    if ( is_hvm_vcpu(v) )

    list_for_each_entry ( s,
                          list_entry )
        struct hvm_ioreq_vcpu *sv;

        list_for_each_entry ( sv,
                              list_entry )
            if ( sv->vcpu == v )
                if ( !hvm_wait_for_io(sv, get_ioreq(s, v)) )


    /* Inject pending hw/sw trap */
    if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
        v->arch.hvm_vcpu.inject_trap.vector = -1;

While we need to set the data per-domain and have whatever VCPU inject
the page fault - _but_only_if_ it's in usermode and its CR3 points to
something interesting.

Hope I've been following the code correctly.

Razvan Cojocaru

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.