[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console problem on domU on tip?
Dan, Please keep drilling on this isssue to root cause the bug behind it. Though Linux doesn't use PAL_FLUSH_CACHE, but yet another major OS does use it during runtime per recollection. As Anthony pointed, please make sure PAL ITR is there before calling! -Fred Xu, Anthony wrote: > Dan, > Could you double check the itr which is mapping PAL code is there > just before invoking ia64_pal_call_static? > > Thanks > -Anthony > >> -----Original Message----- >> From: Magenheimer, Dan (HP Labs Fort Collins) >> [mailto:dan.magenheimer@xxxxxx] Sent: 2005年12月24日 2:22 To: Xu, >> Anthony; Yang, Fred; Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >> Subject: RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console >> problem on domU on tip? >> >> I got up early and spent several hours trying to debug >> this further. By adding timing loops and other debug code >> and moving all the relevant PAL macros around, I proved >> conclusively that the ia64_pal_call_static assembly routine >> is not returning. Next I added an infinite loop to the ivt >> nested TLB handler (which isn't used by Xen except by some >> fast paths that are currently off). With this loop, the >> error message goes away and Xen "freezes". I think this >> proves that the PAL call is inappropriately accessing some >> (unpinned) data location with psr.ic off. >> >> You should note that this is the only PAL call that requires >> psr.ic to be off. I suspect that OS's need to be prepared >> for the possibility that a fault occurs. Linux is not >> so never calls the routine. Xen is not prepared either. >> >> Happy holidays! >> >> Dan >> >>> -----Original Message----- >>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] >>> Sent: Thursday, December 22, 2005 7:29 PM >>> To: Magenheimer, Dan (HP Labs Fort Collins); Yang, Fred; >>> Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >>> Subject: RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] >>> Console problem on domU on tip? >>> >>>> With CONFIG_IA64_SPLIT_CACHE on, a new user may encounter >>>> the problem on a shipping machine and the symptom is that >>>> the machine immediately crashes when a domU is launched. >>> >>> Dan, >>> That means dom0 can boot with CONFIG_IA64_SPLIT_CACHE on, and >>> PAL_CACHE_FLUSH has been invoked successfully in the process >>> of dom0 boot. So this is not PAL_CACHE_FLUSH issue, there >>> must be some other issue. Could you provide more information >>> about the crash, due to we can't reproduce this issue. >>> >>> Thanks. >>> >>> -Anthony >>> >>> >>>> -----Original Message----- >>>> From: Magenheimer, Dan (HP Labs Fort Collins) >>> [mailto:dan.magenheimer@xxxxxx] >>>> Sent: 2005年12月22日 21:26 >>>> To: Yang, Fred; Xu, Anthony; Tian, Kevin; >>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >>>> Subject: RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console >>>> problem on domU on tip? >>>> >>>> With CONFIG_IA64_SPLIT_CACHE on, a new user may encounter >>>> the problem on a shipping machine and the symptom is that >>>> the machine immediately crashes when a domU is launched. >>>> >>>> With CONFIG_IA64_SPLIT_CACHE off, a developer may encounter >>>> a different problem on an unreleased machine. >>>> >>>> I know that you are focused primarily on the unreleased machine, >>>> but in this case, I think we should be cautious for the new user >>>> as the developer knows to change the option when running >>>> on the unreleased machine. >>>> >>>> I will spend some more time on this when I have a chance. >>>> I think it is a real bug (probably PAL accessing some address >>>> which isn't pinned) that occurs only on some boxes due >>>> to some factor like memory configuration. >>>> >>>> Thanks, >>>> Dan >>>> >>>> P.S. The debug output just before the crash was: >>>> ia64_fault: General Exception: IA-64 Reserved Register/Field fault >>>> (data access): reflecting >>>> >>>>> -----Original Message----- >>>>> From: Yang, Fred [mailto:fred.yang@xxxxxxxxx] >>>>> Sent: Wednesday, December 21, 2005 10:34 PM >>>>> To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony; >>>>> Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >>>>> Subject: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] >>>>> Console problem on domU on tip? >>>>> >>>>> Dan, >>>>> >>>>> Can we suggest to always turn on #CONFIG_IA64_SPLIT_CACHE as >>>>> the default build configuration. People may not be aware of >>>>> this build flag and miss it one each new build. >>>>> >>>>> All the newer generation ia64 processors will come with >>>>> splitted I/Dcache as discussed in the previous mail thread >>>>> and it is documented in the Itanium architectur of possible >>>>> splitted cache for future implementation. With default >>>>> turning off, it is a potential bugs for all Tiger4 systems >>>>> using for daily development and future platforms to come. >>>>> >>>>> It is also indicated through your mail, it is only HP rx2620 >>>>> system has issue and not the other HP boxes. Can you track >>>>> down this issue? Rather than put a kludge for rx2620 box? >>>>> >>>>> Thanks, >>>>> >>>>> -Fred >>>>> >>>>> >>>>> Magenheimer, Dan (HP Labs Fort Collins) wrote: >>>>>> Committed (but without removal of ifdefs until we >>>>>> track down this problem). >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] >>>>>>> Sent: Monday, December 19, 2005 7:15 PM >>>>>>> To: Magenheimer, Dan (HP Labs Fort Collins); Tian, Kevin; >>>>>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >>>>>>> >>>>>>> I guest maybe the firmware on your machine doesn't implement >>>>>>> this pal call due to there is no split I/D cache at that >>>>>>> time, so when you call this pal call, it will return >>>>>>> PAL_STATUS_UNIMPLEMENTED, Could you please turn on >>>>>>> CONFIG_IA64_SPLIT_CACHE and try this new patch to see >>>>>>> whether your machine can boot domain0? >>>>>>> If this patch works, could you please remove all >>>>>>> CONFIG_IA64_SPLIT_CACHE macro? >>>>>>> >>>>>>> Thanks >>>>>>> -Anthony >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Magenheimer, Dan (HP Labs Fort Collins) >>>>>>> [mailto:dan.magenheimer@xxxxxx] >>>>>>>> Sent: 2005年12月19日 23:48 >>>>>>>> To: Xu, Anthony; Tian, Kevin; >>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >>>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >>>>>>>> >>>>>>>> I have been distracted tracking another bug... >>>>>>>> >>>>>>>> Here's where I got: >>>>>>>> >>>>>>>> The machine is a new (April 2005) HP rx2620 so it is >>>>>>>> not old firmware. I can't reproduce it on a machine >>>>>>>> with an ITP (which does have older firmware). >>>>>>>> >>>>>>>> This PAL call is never used in Linux, though there is a >>>>>>>> routine coded for it. It is the only >>>>>>>> PAL call coded in Linux that occurs with psr.ic off. >>>>>>>> >>>>>>>> The crash I am seeing occurs either during the PAL call or >>>>>>>> immediately upon return. >>>>>>>> >>>>>>>> Is it OK to >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] >>>>>>>>> Sent: Monday, December 19, 2005 2:02 AM >>>>>>>>> To: Tian, Kevin; Magenheimer, Dan (HP Labs Fort Collins); >>>>>>>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >>>>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >>>>>>>>> >>>>>>>>> Dan, >>>>>>>>> Have you got time to verify below discussion? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> -Anthony >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Tian, Kevin >>>>>>>>>> Sent: 2005年12月16日 10:16 >>>>>>>>>> To: Xu, Anthony; 'Magenheimer, Dan (HP Labs Fort Collins)'; >>>>>>>>>> 'xen-ia64-devel@xxxxxxxxxxxxxxxxxxx' >>>>>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >>>>>>>>>> >>>>>>>>>>> From: Xu, Anthony >>>>>>>>>>> Sent: 2005年12月16日 9:54 >>>>>>>>>>> >>>>>>>>>>>> Also, why panic if it fails? >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Panic is not required here, and we could just print out a >>>>>>>>>> warning message. Previously panic is kept there to help our >>>>>>>>>> debug in early stage. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Does the problem happen only on VTI? Or both VTI and >>>>>>>>>>>> non-VTI on split-cache machines? >>>>>>>>>>> >>>>>>>>>>> Sometimes, it makes domain0 crash at the very beginning of >>>>>>>>>>> the domain0 boot process, especially on MP machine. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> -Anthony >>>>>>>>>> >>>>>>>>>> One complement is, that problem definitely exists on new >>>>>>>>>> split-cache processors, for dom0/domU. For VTI domain, we >>>>>>>>>> have logic within device model to ensure consistence. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Kevin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: Magenheimer, Dan (HP Labs Fort Collins) >>>>>>>>>>> [mailto:dan.magenheimer@xxxxxx] >>>>>>>>>>>> Sent: 2005年12月16日 1:39 >>>>>>>>>>>> To: Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >>>>>>>>>>>> Cc: Xu, Anthony >>>>>>>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on >>>>>>>>>>>> tip? >>>>>>>>>>>> >>>>>>>>>>>>>> Is this code fragment necessary for VTI to boot domU >>>>>>>>>>>>>> or is it OK to remove? >>>>>>>>>>>>> >>>>>>>>>>>>> The comment is inaccurate and it should be domU. That I/D >>>>>>>>>>>>> cache sync step is mandatory to boot domU on new IA64 >>>>>>>>>>>>> processor which has split L2 I/D cache. If without such >>>>>>>>>>>>> I/D cache sync, control panel loads domU's kernel image >>>>>>>>>>>>> which only affects D side cache. If there're some stale >>>>>>>>>>>>> entry on I-side cache within same range of dom0 image, >>>>>>>>>>>>> people will see machine going weird. >>>>>>>>>>>> >>>>>>>>>>>> I don't understand... how can there be stale entries in the >>>>>>>>>>>> I-cache? The instructions have just been written to memory >>>>>>>>>>>> (through D-cache) and no instructions in this domain have >>>>>>>>>>>> yet been executed. I do see that the D-cache needs to be >>>>>>>>>>>> flushed so that memory is coherent but are there better >>>>>>>>>>>> ways to do that without a pal call? >>>>>>>>>>>> >>>>>>>>>>>>> Normally I/D cache sync shouldn't force any problem. >>>>>>>>>>>>> Possibly there's some problem with the pal calling code, >>>>>>>>>>>>> like incorrect ITLB mapping for pal or similar issue... >>>>>>>>>>>> >>>>>>>>>>>> Although the ia64_pal_cache_flush routine is defined in >>>>>>>>>>>> linux's pal.h, it doesn't appear to be used anywhere in >>>>>>>>>>>> Linux so there is no use model to copy. I suspect there >>>>>>>>>>>> is some use model for the call that we don't understand, >>>>>>>>>>>> for example maybe it should only be called with physical >>>>>>>>>>>> &progress? It definitely fails every time on one of my >>>>>>>>>>>> (newer) machines and disabling the pal call makes the >>>>>>>>>>>> problem go away. >>>>>>>>>>>> >>>>>>>>>>>>> Though it's intermittent, please >>>>>>>>>>>>> keep this code >>>>>>>>>>>>> there for correctness. >>>>>>>>>>>> >>>>>>>>>>>> Since the call is definitely failing under some >>>>>>>>>>>> circumstances that we don't understand, I'm inclined to at >>>>>>>>>>>> least put the code in an #ifdef CONFIG_SPLIT_CACHE >>>>>>>>>>>> >>>>>>>>>>>> Does the problem happen only on VTI? Or both VTI and >>>>>>>>>>>> non-VTI on split-cache machines? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Dan >>>>>>>>>>>> >>>>>>>>>>>> P.S. I tried Anthony's patch (which moves the PAL call >>>>>>>>>>>> after new_thread()) but it still crashes. _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |