[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH V2] x86, amd_ucode: Safeguard against #GP
On 5/30/2014 11:21 AM, Andrew Cooper wrote: On 30/05/2014 17:01, Aravind Gopalakrishnan wrote:On 5/28/2014 12:56 PM, boris ostrovsky wrote:On 5/28/2014 11:16 AM, Aravind Gopalakrishnan wrote:On 5/27/2014 6:47 PM, Andrew Cooper wrote:On 27/05/2014 19:24, Aravind Gopalakrishnan wrote:I thought we had identified that the hangs were to do with your use ofWhen HW tries to load a corrupted patch, it generates #GP and hangs the system. Use wrmsr_safe instead so that we fail to load microcode gracefully. Also, massage error handling around apply_microcode to keep in tune with error handling style of other parts of the code. Example on a Fam15h system- (XEN) microcode: CPU0 collect_cpu_info: patch_id=0x6000626 (XEN) microcode: CPU0 size 7870, block size 2586 offset 76 equivID 0x6012 rev 0x6000637 (XEN) microcode: CPU0 found a matching microcode update with version 0x6000637 (current=0x6000626) (XEN) traps.c:3073: GPF (0000): ffff82d08016f682 -> ffff82d08022d9f8(XEN) microcode: CPU0 update from revision 0x6000637 to 0x6000626 failed^^^^^^^^^^^^^^^^^^^^^^As shown, the log message above has the two revisions reversed. Fix thisChanges in V2: - Do not ignore return value from wrmsr_safe - Flip revision numbers as shown aboveSigned-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@xxxxxxx>Reviewed-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>'noreboot' on the Xen command line.Hmm. Yeah.. I figured using wrmsr_safe allows user to just boot into dom0 withouthaving to run through reboot loops. (lazy alternative I guess)Nevermind then. Thanks for the comments (Jan and Andrew). Will keep in mind for the future.I don't understand --- the fact that you had noreboot option meant that your system wouldn't reboot (duh!) when a patch is corrupted (aka "it will hang"). But I'd think we don't want a reboot neither --- we want to safely skip the patch (and possibly backlist it).So, allowing the reboot to happen in turn allows entry to grub. Where we can simply remove 'ucode=' option and this is same as 'skipping' the patch using wrmsr_safe right? Except this is now explicitly done by the sysadmin while wrmsr_safe just works without anyone doing some extra work; and printing a log message informs user that update went wrong for whatever reason. My understanding from earlier comments is that Andrew (and Jan) would rather not see a change from wrmsrl to wrmsr_safe if it is needless because there is already a way someone can circumvent the corrupt patch: just don't provide it on the grub menu. Thanks, -Aravind.Sorry - I think my comment confused the issue. Let me retry.Originally, the bug was described approximately as "A corrupt ucode will cause a GP fault, causing the server to hang".The unhandled #GP fault certainly should be wrapped with wrmsr_safe(), and an error/warning presented to the user. In the case that a bad ucode is discovered, it should be discarded and the server allowed to boot. It is substantially more useful for the server to come up and say "I couldn't load that bit of microcode you wanted me to", than to sit in a reboot loop because you made a typo in the bootloader config, and have to get someone in the datacenter to poke the physical server.My objection was to the wording of the comment alone. Unhandled #GP faults do not "hang" Xen unless you ask for them to behave in that way, given "noreboot" on the command line. Ah. Okay, let me look into this again then and fix issues with V2. -Aravind. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |