[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags



>>> On 12.08.14 at 17:12, <boris.ostrovsky@xxxxxxxxxx> wrote:
> On 08/12/2014 06:37 AM, Jan Beulich wrote:
>>>>> On 08.08.14 at 18:55, <boris.ostrovsky@xxxxxxxxxx> wrote:
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu 
>>> *next)
>>>   
>>>       if ( is_hvm_vcpu(prev) )
>>>       {
>>> -        if (prev != next)
>>> +        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>>>               vpmu_save(prev);
>>>   
>>>           if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
>>> @@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu 
>>> *next)
>>>                              !is_hardware_domain(next->domain));
>>>       }
>>>   
>>> -    if (is_hvm_vcpu(next) && (prev != next) )
>>> +    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & 
>>> XENPMU_MODE_SELF) )
>>>           /* Must be done with interrupts enabled */
>>>           vpmu_load(next);
>> Wouldn't such vPMU internals be better hidden in the functions
>> themselves? I realize you can save the calls this way, but if the
>> condition changes again later, we'll again have to adjust this
>> core function rather than just the vPMU code. It's bad enough that
>> the vpmu_mode variable is visible to non-vPMU code.
> 
> How about an inline function?

Yeah, that would perhaps do too. Albeit I'd still prefer all vPMU
logic to be handle in the called vPMU functions.

> I would prefer the whole series to get in in one go (with patch 2 
> possibly being the only exception). All other patches (including patch 
> 1) are not particularly useful on their own.

Okay. In that case in patch 1, please consider swapping struct
xenpf_symdata's address and name fields (I had put a respective
note on the patch for when committing it), shrinking the structure
for compat mode guests.

>>> +        goto cont_wait;
>>> +
>>> +    cpumask_andnot(&allbutself, &cpu_online_map,
>>> +                   cpumask_of(smp_processor_id()));
>>> +
>>> +    sync_task = xmalloc_array(struct tasklet, allbutself_num);
>>> +    if ( !sync_task )
>>> +    {
>>> +        printk("vpmu_force_context_switch: out of memory\n");
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    for ( i = 0; i < allbutself_num; i++ )
>>> +        tasklet_init(&sync_task[i], vpmu_sched_checkin, 0);
>>> +
>>> +    atomic_set(&vpmu_sched_counter, 0);
>>> +
>>> +    j = 0;
>>> +    for_each_cpu ( i, &allbutself )
>> This looks to be the only use for the (on stack) allbutself variable,
>> but you could easily avoid this by using for_each_online_cpu() and
>> skipping the local one. I'd also recommend that you count
>> allbutself_num here rather than up front, since that will much more
>> obviously prove that you wait for exactly as many CPUs as you
>> scheduled. The array allocation above is bogus anyway, as on a
>> huge system this can easily be more than a page in size.
> 
> I don't understand the last sentence. What does page-sized allocation 
> have to do with this?

We dislike greater than page size allocations at runtime.

>>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>>> +{
>>> +    int ret = -EINVAL;
>>> +    xen_pmu_params_t pmu_params;
>>> +
>>> +    switch ( op )
>>> +    {
>>> +    case XENPMU_mode_set:
>>> +    {
>>> +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
>>> +        uint32_t current_mode;
>>> +
>>> +        if ( !is_control_domain(current->domain) )
>>> +            return -EPERM;
>>> +
>>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>>> +            return -EFAULT;
>>> +
>>> +        if ( pmu_params.val & ~XENPMU_MODE_SELF )
>>> +            return -EINVAL;
>>> +
>>> +        /*
>>> +         * Return error is someone else is in the middle of changing mode 
>>> ---
>>> +         * this is most likely indication of two system administrators
>>> +         * working against each other
>>> +         */
>>> +        if ( !spin_trylock(&xenpmu_mode_lock) )
>>> +            return -EAGAIN;
>>> +
>>> +        current_mode = vpmu_mode;
>>> +        vpmu_mode = pmu_params.val;
>>> +
>>> +        if ( vpmu_mode == XENPMU_MODE_OFF )
>>> +        {
>>> +            /*
>>> +             * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. 
>>> This
>>> +             * can be achieved by having all physical processors go through
>>> +             * context_switch().
>>> +             */
>>> +            ret = vpmu_force_context_switch(arg);
>>> +            if ( ret )
>>> +                vpmu_mode = current_mode;
>>> +        }
>>> +        else
>>> +            ret = 0;
>>> +
>>> +        spin_unlock(&xenpmu_mode_lock);
>>> +        break;
>> This still isn't safe: There's nothing preventing another vCPU to issue
>> another XENPMU_mode_set operation while the one turning the vPMU
>> off is still in the process of waiting, but having exited the lock protected
>> region in order to allow other processing to occur.
> 
> I think that's OK: one of these two competing requests will timeout in 
> the while loop of vpmu_force_context_switch(). It will, in fact, likely 
> be the first caller because the second one will get right into the while 
> loop, without setting up the waiting data. This is a little 
> counter-intuitive but trying to set mode simultaneously from two 
> different places is asking for trouble anyway.

Sure it is, but we nevertheless want the hypervisor to be safe. So I
consider what you currently have acceptable only if you can prove
that nothing bad can happen no matter in what order multiple
requests get issued - from looking over your code I wasn't able to
convince myself of this.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.