[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v1] psr: fix bug which may cause crash
On 19-11-27 11:06:49, Jan Beulich wrote: > On 27.11.2019 07:24, Yi Sun wrote: > > During test, we found a crash on Xen with below trace. > > (XEN) Xen call trace: > > (XEN) [<ffff82d0802a065a>] R psr.c#l3_cdp_write_msr+0x1e/0x22 > > (XEN) [<ffff82d0802a0858>] F psr.c#do_write_psr_msrs+0x6d/0x109 > > (XEN) [<ffff82d08023e000>] F smp_call_function_interrupt+0x5a/0xac > > (XEN) [<ffff82d0802a2b89>] F call_function_interrupt+0x20/0x34 > > (XEN) [<ffff82d080282c64>] F do_IRQ+0x175/0x6ae > > (XEN) [<ffff82d08038b8ba>] F common_interrupt+0x10a/0x120 > > (XEN) [<ffff82d0802ec616>] F cpu_idle.c#acpi_idle_do_entry+0x9d/0xb1 > > (XEN) [<ffff82d0802ecc01>] F cpu_idle.c#acpi_processor_idle+0x41d/0x626 > > (XEN) [<ffff82d08027353b>] F domain.c#idle_loop+0xa5/0xa7 > > (XEN) > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 20: > > (XEN) GENERAL PROTECTION FAULT > > (XEN) [error_code=0000] > > (XEN) **************************************** > > > > Root cause is that the cache of COS registers are not initialized > > for CAT/CDP which have non-zero default value. That causes invalid > > write to MSR when COS id has exceeded the max number.. So fix it by > > initializing the cache. > > I'm struggling with this description, first and foremost because > there's no (recognizable to me) connection between the supposed > root cause and the crash. Exceeding the maximum number is a bug in > some loop's bounds I would say, not an omission of cached value > initialization. In particular I see in do_write_psr_msrs() > > for ( j = 0; j < cos_num; j++, index++ ) > { > if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] ) > { > feat->cos_reg_val[cos * cos_num + j] = info->val[index]; > props->write_msr(cos, info->val[index], props->type[j]); > } > } > Sorry, my description is not clear. The bug happens when CDP and MBA co-exist and MBA COS_MAX is bigger than CDP COS_MAX. E.g. MBA has 8 COS registers but CDP only have 6. When setting MBA throttling value for the 7th guest, the value array would be: +------------------+------------------+--------------+ | Data default val | Code default val | MBA throttle | +------------------+------------------+--------------+ We should avoid writting CDP data/code valules to MSR here. This should be prevented by: if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] ) But the whole cos_reg_val[] is not initialized to default value so that the check cannot prevent default value setting. > Afaict the makes clear that values found in ->cos_reg_val[] would > never get written out (which fits it being just a cache). If > anything, a "random" match of the cache value and the two be > written value would _prevent_ an MSR write despite potentially > the MSR in fact currently holding a different value. > > Nevertheless a few remarks on the patch itself, just in case > it's just the description that has misguided me. > > > --- a/xen/arch/x86/psr.c > > +++ b/xen/arch/x86/psr.c > > @@ -316,6 +316,7 @@ static bool cat_init_feature(const struct cpuid_leaf > > *regs, > > [FEAT_TYPE_L3_CDP] = "L3 CDP", > > [FEAT_TYPE_L2_CAT] = "L2 CAT", > > }; > > + unsigned int i = 0; > > Unnecessary initializer and too wide a scope. > Ok, u8 is enough. > > @@ -332,7 +333,8 @@ static bool cat_init_feature(const struct cpuid_leaf > > *regs, > > return false; > > > > /* We reserve cos=0 as default cbm (all bits within cbm_len are > > 1). */ > > - feat->cos_reg_val[0] = cat_default_val(feat->cat.cbm_len); > > + for(i = 0; i < MAX_COS_REG_CNT; i++) > > There are number of blanks missing here (and even more ones in > the other instance below). It also seems to me that the comment > ends up misplaced now. If ... > Sorry, the comment should be modified. > > + feat->cos_reg_val[i] = cat_default_val(feat->cat.cbm_len); > > > > wrmsrl((type == FEAT_TYPE_L3_CAT ? > > MSR_IA32_PSR_L3_MASK(0) : > > ... this indeed is to remain a single write, it may want to move > here. But as per above keeping cached and actual values in sync > may make it necessary to move this write into the loop as well. > You are right, I missed to loop this sentence. Another idea: I remembered that the original purpose to only write COS[0] here is to improve performance by not writing too many MSRs. So I am thinking to change the fix to below line in do_write_psr_msrs(). if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] && cos <= feat->cos_max ) What is your opinion? Thanks! > Jan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxxx > https://lists.xenproject.org/mailman/listinfo/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |