[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console problem on domU on tip?
Hi Fred -- I understand your pain. I too wasted time building and testing bits with the code turned on. However: 1) It is not uncommon in the open source community for the needs of publicly-available machines to take precedence over the needs of unreleased future machines. 2) It is not uncommon in the open source community for a unreleased future machine to require different config files than the defaults. 3) This specific code that is failing is not even needed on publicly-available machines. It is not uncommon in the open source community to refuse patches that are only needed for unreleased future machines. That said, I understand that your team as well as other Xen developers are primarily using these future machines for development and testing, so let me suggest a compromise: Is there a way to dynamically test early in boot to determine if this machine has split I-D caches? If so, you could provide a patch that sets a global or cpu variable appropriately and changes the compile-time ifdef to a run-time if test. Dan P.S. Thinking about this makes me realize.. the pal cache flush code may be inadequate anyway when we get to SMP-guest support because the stale mapping may be on another processor. > -----Original Message----- > From: Yang, Fred [mailto:fred.yang@xxxxxxxxx] > Sent: Thursday, December 22, 2005 9:15 AM > To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony; > Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > Subject: RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] > Console problem on domU on tip? > > Dan, > > We spent long time to track down Cset#8383 yesterday, and now > the current identified issue is I/Dcache patch was not turned > on in the default built! Hope other community members won't > hit this problem again. > > From the discussion, it is definitely the issue on the > specific HP box on accessing PAL call. To be the correct > approach, we should definitely track it down to find out the > potential implementation or platform issue. > > Hope you can track this down ASAP to remove this hurdle. > > -Fred > > Magenheimer, Dan (HP Labs Fort Collins) wrote: > > With CONFIG_IA64_SPLIT_CACHE on, a new user may encounter > > the problem on a shipping machine and the symptom is that > > the machine immediately crashes when a domU is launched. > > > > With CONFIG_IA64_SPLIT_CACHE off, a developer may encounter > > a different problem on an unreleased machine. > > > > I know that you are focused primarily on the unreleased machine, > > but in this case, I think we should be cautious for the new user > > as the developer knows to change the option when running > > on the unreleased machine. > > > > I will spend some more time on this when I have a chance. > > I think it is a real bug (probably PAL accessing some address > > which isn't pinned) that occurs only on some boxes due > > to some factor like memory configuration. > > > > Thanks, > > Dan > > > > P.S. The debug output just before the crash was: > > ia64_fault: General Exception: IA-64 Reserved Register/Field fault > > (data access): reflecting > > > >> -----Original Message----- > >> From: Yang, Fred [mailto:fred.yang@xxxxxxxxx] > >> Sent: Wednesday, December 21, 2005 10:34 PM > >> To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony; > >> Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >> Subject: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] > >> Console problem on domU on tip? > >> > >> Dan, > >> > >> Can we suggest to always turn on #CONFIG_IA64_SPLIT_CACHE as > >> the default build configuration. People may not be aware of > >> this build flag and miss it one each new build. > >> > >> All the newer generation ia64 processors will come with > >> splitted I/Dcache as discussed in the previous mail thread > >> and it is documented in the Itanium architectur of possible > >> splitted cache for future implementation. With default > >> turning off, it is a potential bugs for all Tiger4 systems > >> using for daily development and future platforms to come. > >> > >> It is also indicated through your mail, it is only HP rx2620 > >> system has issue and not the other HP boxes. Can you track > >> down this issue? Rather than put a kludge for rx2620 box? > >> > >> Thanks, > >> > >> -Fred > >> > >> > >> Magenheimer, Dan (HP Labs Fort Collins) wrote: > >>> Committed (but without removal of ifdefs until we > >>> track down this problem). > >>> > >>>> -----Original Message----- > >>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] > >>>> Sent: Monday, December 19, 2005 7:15 PM > >>>> To: Magenheimer, Dan (HP Labs Fort Collins); Tian, Kevin; > >>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? > >>>> > >>>> I guest maybe the firmware on your machine doesn't implement > >>>> this pal call due to there is no split I/D cache at that > >>>> time, so when you call this pal call, it will return > >>>> PAL_STATUS_UNIMPLEMENTED, Could you please turn on > >>>> CONFIG_IA64_SPLIT_CACHE and try this new patch to see > >>>> whether your machine can boot domain0? > >>>> If this patch works, could you please remove all > >>>> CONFIG_IA64_SPLIT_CACHE macro? > >>>> > >>>> Thanks > >>>> -Anthony > >>>> > >>>>> -----Original Message----- > >>>>> From: Magenheimer, Dan (HP Labs Fort Collins) > >>>> [mailto:dan.magenheimer@xxxxxx] > >>>>> Sent: 2005å12æ19æ 23:48 > >>>>> To: Xu, Anthony; Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? > >>>>> > >>>>> I have been distracted tracking another bug... > >>>>> > >>>>> Here's where I got: > >>>>> > >>>>> The machine is a new (April 2005) HP rx2620 so it is > >>>>> not old firmware. I can't reproduce it on a machine > >>>>> with an ITP (which does have older firmware). > >>>>> > >>>>> This PAL call is never used in Linux, though there is a > >>>>> routine coded for it. It is the only > >>>>> PAL call coded in Linux that occurs with psr.ic off. > >>>>> > >>>>> The crash I am seeing occurs either during the PAL call or > >>>>> immediately upon return. > >>>>> > >>>>> Is it OK to > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] > >>>>>> Sent: Monday, December 19, 2005 2:02 AM > >>>>>> To: Tian, Kevin; Magenheimer, Dan (HP Labs Fort Collins); > >>>>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? > >>>>>> > >>>>>> Dan, > >>>>>> Have you got time to verify below discussion? > >>>>>> > >>>>>> Thanks > >>>>>> -Anthony > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Tian, Kevin > >>>>>>> Sent: 2005å12æ16æ 10:16 > >>>>>>> To: Xu, Anthony; 'Magenheimer, Dan (HP Labs Fort Collins)'; > >>>>>>> 'xen-ia64-devel@xxxxxxxxxxxxxxxxxxx' > >>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? > >>>>>>> > >>>>>>>> From: Xu, Anthony > >>>>>>>> Sent: 2005å12æ16æ 9:54 > >>>>>>>> > >>>>>>>>> Also, why panic if it fails? > >>>>>>>>> > >>>>>>> > >>>>>>> Panic is not required here, and we could just print out a > >>>>>>> warning message. Previously panic is kept there to help our > >>>>>>> debug in early stage. > >>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Does the problem happen only on VTI? Or both VTI > and non-VTI > >>>>>>>>> on split-cache machines? > >>>>>>>> > >>>>>>>> Sometimes, it makes domain0 crash at the very > beginning of the > >>>>>>>> domain0 boot process, especially on MP machine. > >>>>>>>> > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> -Anthony > >>>>>>> > >>>>>>> One complement is, that problem definitely exists on new > >>>>>>> split-cache processors, for dom0/domU. For VTI domain, we have > >>>>>>> logic within device model to ensure consistence. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Kevin > >>>>>>>> > >>>>>>>> > >>>>>>>>> -----Original Message----- > >>>>>>>>> From: Magenheimer, Dan (HP Labs Fort Collins) > >>>>>>>> [mailto:dan.magenheimer@xxxxxx] > >>>>>>>>> Sent: 2005å12æ16æ 1:39 > >>>>>>>>> To: Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >>>>>>>>> Cc: Xu, Anthony > >>>>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on > domU on tip? > >>>>>>>>> > >>>>>>>>>>> Is this code fragment necessary for VTI to boot domU > >>>>>>>>>>> or is it OK to remove? > >>>>>>>>>> > >>>>>>>>>> The comment is inaccurate and it should be > domU. That I/D > >>>>>>>>>> cache sync step is mandatory to boot domU on new IA64 > >>>>>>>>>> processor which has split L2 I/D cache. If without such I/D > >>>>>>>>>> cache sync, control panel loads domU's kernel image which > >>>>>>>>>> only affects D side cache. If there're some stale entry on > >>>>>>>>>> I-side cache within same range of dom0 image, people will > >>>>>>>>>> see machine going weird. > >>>>>>>>> > >>>>>>>>> I don't understand... how can there be stale entries in the > >>>>>>>>> I-cache? The instructions have just been written to memory > >>>>>>>>> (through D-cache) and no instructions in this > domain have yet > >>>>>>>>> been executed. I do see that the D-cache needs to be flushed > >>>>>>>>> so that memory is coherent but are there better ways to do > >>>>>>>>> that without a pal call? > >>>>>>>>> > >>>>>>>>>> Normally I/D cache sync shouldn't force any problem. > >>>>>>>>>> Possibly there's some problem with the pal calling code, > >>>>>>>>>> like incorrect ITLB mapping for pal or similar issue... > >>>>>>>>> > >>>>>>>>> Although the ia64_pal_cache_flush routine is defined in > >>>>>>>>> linux's pal.h, it doesn't appear to be used > anywhere in Linux > >>>>>>>>> so there is no use model to copy. I suspect there is some > >>>>>>>>> use model for the call that we don't understand, for example > >>>>>>>>> maybe it should only be called with physical &progress? It > >>>>>>>>> definitely fails every time on one of my (newer) > machines and > >>>>>>>>> disabling the pal call makes the problem go away. > >>>>>>>>> > >>>>>>>>>> Though it's intermittent, please > >>>>>>>>>> keep this code > >>>>>>>>>> there for correctness. > >>>>>>>>> > >>>>>>>>> Since the call is definitely failing under some > circumstances > >>>>>>>>> that we don't understand, I'm inclined to at least put the > >>>>>>>>> code in an #ifdef CONFIG_SPLIT_CACHE > >>>>>>>>> > >>>>>>>>> Does the problem happen only on VTI? Or both VTI > and non-VTI > >>>>>>>>> on split-cache machines? > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Dan > >>>>>>>>> > >>>>>>>>> P.S. I tried Anthony's patch (which moves the PAL call after > >>>>>>>>> new_thread()) but it still crashes. > > _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |