[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 4/6] microcode: don't call apply_microcode() in cpu_request_microcode()



On Thu, Nov 29, 2018 at 10:46:05AM +0100, Roger Pau Monné wrote:
>On Thu, Nov 29, 2018 at 12:28:46PM +0800, Chao Gao wrote:
>> On Wed, Nov 28, 2018 at 04:02:25PM +0100, Roger Pau Monné wrote:
>> >On Wed, Nov 28, 2018 at 01:34:14PM +0800, Chao Gao wrote:
>> >> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
>> >> index 8350d22..cca7b2c 100644
>> >> --- a/xen/arch/x86/microcode.c
>> >> +++ b/xen/arch/x86/microcode.c
>> >> @@ -233,20 +233,12 @@ int microcode_resume_cpu(unsigned int cpu)
>> >>      return err;
>> >>  }
>> >>  
>> >> -static int microcode_update_cpu(const void *buf, size_t size)
>> >> +static int microcode_update_cpu(void)
>> >>  {
>> >>      int err;
>> >> -    unsigned int cpu = smp_processor_id();
>> >> -    struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
>> >>  
>> >>      spin_lock(&microcode_mutex);
>> >> -
>> >> -    err = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
>> >> -    if ( likely(!err) )
>> >> -        err = microcode_ops->cpu_request_microcode(cpu, buf, size);
>> >> -    else
>> >> -        __microcode_fini_cpu(cpu);
>> >> -
>> >> +    err = microcode_ops->apply_microcode(smp_processor_id());
>> >>      spin_unlock(&microcode_mutex);
>> >>  
>> >>      return err;
>> >> @@ -259,7 +251,7 @@ static long do_microcode_update(void *_info)
>> >>  
>> >>      BUG_ON(info->cpu != smp_processor_id());
>> >>  
>> >> -    error = microcode_update_cpu(info->buffer, info->buffer_size);
>> >> +    error = microcode_update_cpu();
>> >
>> >Why don't you just set info->error = microcode_update_cpu()?
>> >
>> >AFAICT this is done to attempt to update the remaining CPUs if one
>> >update failed?
>> 
>> Yes. But this patch doesn't change the logic here. Actually, if HT is
>> enabled and microcode is shared between the logical threads of the same
>> core, so if one thread updates microcode successfully, its sibling would
>> always fail in current logic. I am trying to explain why we cannot abort
>> the update even though an error is met in current logic. It definitely
>> can be solved by tweaking the logic slightly. 
>> 
>> >
>> >Is there anyway to rollback to the previous state so all CPUs have the
>> >same microcode?
>> 
>> Seems it is not allowed to load a microcode with numeratically smaller
>> revision according to 9.11.7.2.
>> 
>> With patch 6, a panic() would be triggered if some cpus failed to do the
>> update. I didn't try to change the logic here.
>> 
>> >I assume nothing good will come out of running a
>> >system with CPUs using different microcode versions, but maybe that's
>> >not so bad?
>> 
>> It is better that all CPUs have the same microcode revision. 
>> 
>> Linux kernel rejects late microcode update if finding some CPUs
>> offlined. I may port this patch to Xen too in a separate patch.
>
>What happens with hotplug CPUs?
>
>Even if you disable late loading when there are offlined CPUs you
>still need to handle hotplug CPUs, which IMO should share the same
>path with offlined CPUs: the microcode update should be done ASAP
>after bringing the CPU up.
>

In linux, CPU's being offline is just a logical offline. It may participate
actions like MCE. It would lead to a situation that some cpus are using old
microcode while some are using the new one, which introduces instability.
CPU hotplug doesn't have such issue.

>> >
>> >>      if ( error )
>> >>          info->error = error;
>> >>  
>> >> @@ -276,6 +268,8 @@ int 
>> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long 
>> >> len)
>> >>  {
>> >>      int ret;
>> >>      struct microcode_info *info;
>> >> +    unsigned int cpu = smp_processor_id();
>> >> +    struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
>> >>  
>> >>      if ( len != (uint32_t)len )
>> >>          return -E2BIG;
>> >> @@ -294,10 +288,6 @@ int 
>> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long 
>> >> len)
>> >>          return ret;
>> >>      }
>> >>  
>> >> -    info->buffer_size = len;
>> >> -    info->error = 0;
>> >> -    info->cpu = cpumask_first(&cpu_online_map);
>> >> -
>> >>      if ( microcode_ops->start_update )
>> >>      {
>> >>          ret = microcode_ops->start_update();
>> >> @@ -308,6 +298,26 @@ int 
>> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long 
>> >> len)
>> >>          }
>> >>      }
>> >>  
>> >> +    spin_lock(&microcode_mutex);
>> >> +
>> >> +    ret = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
>> >> +    if ( likely(!ret) )
>> >> +        ret = microcode_ops->cpu_request_microcode(cpu, info->buffer, 
>> >> len);
>> >> +    else
>> >> +        __microcode_fini_cpu(cpu);
>> >> +
>> >> +    spin_unlock(&microcode_mutex);
>> >
>> >Why do you need to hold the lock here?
>> >
>> >microcode_update is already serialized and should only be executed on
>> >a CPU at a time due to the usage of chained
>> >continue_hypercall_on_cpu.
>> 
>> microcode_resume_cpu() also uses the 'uci' and the global microcode cache.
>> This lock is to prevent them happening simultaneously (someone is
>> adding/replacing entries to a list and another is reading the list).
>> All existing call sites of collec_cpu_info() and cpu_request_microcode()
>> are protected with this lock.
>
>I mean, there should be some kind of protection to prevent the list
>from changing at all while there's an update in progress, or else you
>risk using different versions for different CPUs if there's a list
>addition while a microcode update is in progress?

Both adding to and using the microcode cache are done with
microcode_mutex held. So I don't think it can happen.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.