|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 4/6] microcode: don't call apply_microcode() in cpu_request_microcode()
On Thu, Nov 29, 2018 at 10:46:05AM +0100, Roger Pau Monné wrote:
>On Thu, Nov 29, 2018 at 12:28:46PM +0800, Chao Gao wrote:
>> On Wed, Nov 28, 2018 at 04:02:25PM +0100, Roger Pau Monné wrote:
>> >On Wed, Nov 28, 2018 at 01:34:14PM +0800, Chao Gao wrote:
>> >> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
>> >> index 8350d22..cca7b2c 100644
>> >> --- a/xen/arch/x86/microcode.c
>> >> +++ b/xen/arch/x86/microcode.c
>> >> @@ -233,20 +233,12 @@ int microcode_resume_cpu(unsigned int cpu)
>> >> return err;
>> >> }
>> >>
>> >> -static int microcode_update_cpu(const void *buf, size_t size)
>> >> +static int microcode_update_cpu(void)
>> >> {
>> >> int err;
>> >> - unsigned int cpu = smp_processor_id();
>> >> - struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
>> >>
>> >> spin_lock(µcode_mutex);
>> >> -
>> >> - err = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
>> >> - if ( likely(!err) )
>> >> - err = microcode_ops->cpu_request_microcode(cpu, buf, size);
>> >> - else
>> >> - __microcode_fini_cpu(cpu);
>> >> -
>> >> + err = microcode_ops->apply_microcode(smp_processor_id());
>> >> spin_unlock(µcode_mutex);
>> >>
>> >> return err;
>> >> @@ -259,7 +251,7 @@ static long do_microcode_update(void *_info)
>> >>
>> >> BUG_ON(info->cpu != smp_processor_id());
>> >>
>> >> - error = microcode_update_cpu(info->buffer, info->buffer_size);
>> >> + error = microcode_update_cpu();
>> >
>> >Why don't you just set info->error = microcode_update_cpu()?
>> >
>> >AFAICT this is done to attempt to update the remaining CPUs if one
>> >update failed?
>>
>> Yes. But this patch doesn't change the logic here. Actually, if HT is
>> enabled and microcode is shared between the logical threads of the same
>> core, so if one thread updates microcode successfully, its sibling would
>> always fail in current logic. I am trying to explain why we cannot abort
>> the update even though an error is met in current logic. It definitely
>> can be solved by tweaking the logic slightly.
>>
>> >
>> >Is there anyway to rollback to the previous state so all CPUs have the
>> >same microcode?
>>
>> Seems it is not allowed to load a microcode with numeratically smaller
>> revision according to 9.11.7.2.
>>
>> With patch 6, a panic() would be triggered if some cpus failed to do the
>> update. I didn't try to change the logic here.
>>
>> >I assume nothing good will come out of running a
>> >system with CPUs using different microcode versions, but maybe that's
>> >not so bad?
>>
>> It is better that all CPUs have the same microcode revision.
>>
>> Linux kernel rejects late microcode update if finding some CPUs
>> offlined. I may port this patch to Xen too in a separate patch.
>
>What happens with hotplug CPUs?
>
>Even if you disable late loading when there are offlined CPUs you
>still need to handle hotplug CPUs, which IMO should share the same
>path with offlined CPUs: the microcode update should be done ASAP
>after bringing the CPU up.
>
In linux, CPU's being offline is just a logical offline. It may participate
actions like MCE. It would lead to a situation that some cpus are using old
microcode while some are using the new one, which introduces instability.
CPU hotplug doesn't have such issue.
>> >
>> >> if ( error )
>> >> info->error = error;
>> >>
>> >> @@ -276,6 +268,8 @@ int
>> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long
>> >> len)
>> >> {
>> >> int ret;
>> >> struct microcode_info *info;
>> >> + unsigned int cpu = smp_processor_id();
>> >> + struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
>> >>
>> >> if ( len != (uint32_t)len )
>> >> return -E2BIG;
>> >> @@ -294,10 +288,6 @@ int
>> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long
>> >> len)
>> >> return ret;
>> >> }
>> >>
>> >> - info->buffer_size = len;
>> >> - info->error = 0;
>> >> - info->cpu = cpumask_first(&cpu_online_map);
>> >> -
>> >> if ( microcode_ops->start_update )
>> >> {
>> >> ret = microcode_ops->start_update();
>> >> @@ -308,6 +298,26 @@ int
>> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long
>> >> len)
>> >> }
>> >> }
>> >>
>> >> + spin_lock(µcode_mutex);
>> >> +
>> >> + ret = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
>> >> + if ( likely(!ret) )
>> >> + ret = microcode_ops->cpu_request_microcode(cpu, info->buffer,
>> >> len);
>> >> + else
>> >> + __microcode_fini_cpu(cpu);
>> >> +
>> >> + spin_unlock(µcode_mutex);
>> >
>> >Why do you need to hold the lock here?
>> >
>> >microcode_update is already serialized and should only be executed on
>> >a CPU at a time due to the usage of chained
>> >continue_hypercall_on_cpu.
>>
>> microcode_resume_cpu() also uses the 'uci' and the global microcode cache.
>> This lock is to prevent them happening simultaneously (someone is
>> adding/replacing entries to a list and another is reading the list).
>> All existing call sites of collec_cpu_info() and cpu_request_microcode()
>> are protected with this lock.
>
>I mean, there should be some kind of protection to prevent the list
>from changing at all while there's an update in progress, or else you
>risk using different versions for different CPUs if there's a list
>addition while a microcode update is in progress?
Both adding to and using the microcode cache are done with
microcode_mutex held. So I don't think it can happen.
Thanks
Chao
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |