[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console problem on domU on tip?


  • To: "Yang, Fred" <fred.yang@xxxxxxxxx>, "Xu, Anthony" <anthony.xu@xxxxxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>, <xen-ia64-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Magenheimer, Dan (HP Labs Fort Collins)" <dan.magenheimer@xxxxxx>
  • Date: Thu, 22 Dec 2005 09:40:16 -0800
  • Delivery-date: Thu, 22 Dec 2005 17:43:25 +0000
  • List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
  • Thread-index: AcYBELBGMu7ZHSeYRUaEav49mCDvgwAAVFMwAADLrDAAIbQEoAARHDxgAAFwgsAApRrdUAAOEX2AABW4J/AAUJyDYAAareHQABB8dxAABifEIAAC14UQ
  • Thread-topic: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console problem on domU on tip?

Hi Fred --

I understand your pain.  I too wasted time building and
testing bits with the code turned on.  However:

1) It is not uncommon in the open source community for the
   needs of publicly-available machines to take precedence
   over the needs of unreleased future machines.
2) It is not uncommon in the open source community for a
   unreleased future machine to require different config
   files than the defaults.
3) This specific code that is failing is not even needed
   on publicly-available machines.  It is not uncommon in
   the open source community to refuse patches that are
   only needed for unreleased future machines.

That said, I understand that your team as well as other
Xen developers are primarily using these future machines
for development and testing, so let me suggest a compromise:

Is there a way to dynamically test early in boot to determine
if this machine has split I-D caches?  If so, you could provide
a patch that sets a global or cpu variable appropriately and
changes the compile-time ifdef to a run-time if test.

Dan

P.S. Thinking about this makes me realize.. the pal cache flush
code may be inadequate anyway when we get to SMP-guest support
because the stale mapping may be on another processor.

> -----Original Message-----
> From: Yang, Fred [mailto:fred.yang@xxxxxxxxx] 
> Sent: Thursday, December 22, 2005 9:15 AM
> To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony; 
> Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] 
> Console problem on domU on tip?
> 
> Dan,
> 
> We spent long time to track down Cset#8383 yesterday, and now 
> the current identified issue is I/Dcache patch was not turned 
> on in the default built!  Hope other community members won't 
> hit this problem again.
> 
> From the discussion, it is definitely the issue on the 
> specific HP box on accessing PAL call.   To be the correct 
> approach, we should definitely track it down to find out the 
> potential implementation or platform issue.  
> 
> Hope you can track this down ASAP to remove this hurdle.
> 
> -Fred
> 
> Magenheimer, Dan (HP Labs Fort Collins) wrote:
> > With CONFIG_IA64_SPLIT_CACHE on, a new user may encounter
> > the problem on a shipping machine and the symptom is that
> > the machine immediately crashes when a domU is launched.
> > 
> > With CONFIG_IA64_SPLIT_CACHE off, a developer may encounter
> > a different problem on an unreleased machine.
> > 
> > I know that you are focused primarily on the unreleased machine,
> > but in this case, I think we should be cautious for the new user
> > as the developer knows to change the option when running
> > on the unreleased machine.
> > 
> > I will spend some more time on this when I have a chance.
> > I think it is a real bug (probably PAL accessing some address
> > which isn't pinned) that occurs only on some boxes due
> > to some factor like memory configuration.
> > 
> > Thanks,
> > Dan
> > 
> > P.S. The debug output just before the crash was:
> > ia64_fault: General Exception: IA-64 Reserved Register/Field fault
> > (data access): reflecting 
> > 
> >> -----Original Message-----
> >> From: Yang, Fred [mailto:fred.yang@xxxxxxxxx]
> >> Sent: Wednesday, December 21, 2005 10:34 PM
> >> To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony;
> >> Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >> Subject: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel]
> >> Console problem on domU on tip?
> >> 
> >> Dan,
> >> 
> >> Can we suggest to always turn on #CONFIG_IA64_SPLIT_CACHE as
> >> the default build configuration.  People may not be aware of
> >> this build flag and miss it one each new build.
> >> 
> >> All the newer generation ia64 processors will come with
> >> splitted I/Dcache as discussed in the previous mail thread
> >> and it is documented in the Itanium architectur of possible
> >> splitted cache for future implementation.  With default
> >> turning off, it is a potential bugs for all Tiger4 systems
> >> using for daily development and future platforms to come.
> >> 
> >> It is also indicated through your mail, it is only HP  rx2620
> >> system has issue and not the other HP boxes.  Can you track
> >> down this issue?  Rather than put a kludge for rx2620 box?
> >> 
> >> Thanks,
> >> 
> >> -Fred
> >> 
> >> 
> >> Magenheimer, Dan (HP Labs Fort Collins) wrote:
> >>> Committed (but without removal of ifdefs until we
> >>> track down this problem).
> >>> 
> >>>> -----Original Message-----
> >>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
> >>>> Sent: Monday, December 19, 2005 7:15 PM
> >>>> To: Magenheimer, Dan (HP Labs Fort Collins); Tian, Kevin;
> >>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
> >>>> 
> >>>> I guest maybe the firmware on your machine doesn't implement
> >>>> this pal call due to there is no split I/D cache at that
> >>>> time, so when you call this pal call, it will return
> >>>> PAL_STATUS_UNIMPLEMENTED, Could you please turn on
> >>>> CONFIG_IA64_SPLIT_CACHE  and try this new patch to see
> >>>> whether your machine can boot domain0?
> >>>> If this patch works, could you please remove all
> >>>> CONFIG_IA64_SPLIT_CACHE macro?
> >>>> 
> >>>> Thanks
> >>>> -Anthony
> >>>> 
> >>>>> -----Original Message-----
> >>>>> From: Magenheimer, Dan (HP Labs Fort Collins)
> >>>> [mailto:dan.magenheimer@xxxxxx]
> >>>>> Sent: 2005å12æ19æ 23:48
> >>>>> To: Xu, Anthony; Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
> >>>>> 
> >>>>> I have been distracted tracking another bug...
> >>>>> 
> >>>>> Here's where I got:
> >>>>> 
> >>>>> The machine is a new (April 2005) HP rx2620 so it is
> >>>>> not old firmware.   I can't reproduce it on a machine
> >>>>> with an ITP (which does have older firmware).
> >>>>> 
> >>>>> This PAL call is never used in Linux, though there is a
> >>>>> routine coded for it.  It is the only
> >>>>> PAL call coded in Linux that occurs with psr.ic off.
> >>>>> 
> >>>>> The crash I am seeing occurs either during the PAL call or
> >>>>> immediately upon return. 
> >>>>> 
> >>>>> Is it OK to
> >>>>> 
> >>>>> 
> >>>>>> -----Original Message-----
> >>>>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
> >>>>>> Sent: Monday, December 19, 2005 2:02 AM
> >>>>>> To: Tian, Kevin; Magenheimer, Dan (HP Labs Fort Collins);
> >>>>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
> >>>>>> 
> >>>>>> Dan,
> >>>>>> Have you got time to verify below discussion?
> >>>>>> 
> >>>>>> Thanks
> >>>>>> -Anthony
> >>>>>> 
> >>>>>>> -----Original Message-----
> >>>>>>> From: Tian, Kevin
> >>>>>>> Sent: 2005å12æ16æ 10:16
> >>>>>>> To: Xu, Anthony; 'Magenheimer, Dan (HP Labs Fort Collins)';
> >>>>>>> 'xen-ia64-devel@xxxxxxxxxxxxxxxxxxx'
> >>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
> >>>>>>> 
> >>>>>>>> From: Xu, Anthony
> >>>>>>>> Sent: 2005å12æ16æ 9:54
> >>>>>>>> 
> >>>>>>>>> Also, why panic if it fails?
> >>>>>>>>> 
> >>>>>>> 
> >>>>>>> Panic is not required here, and we could just print out a
> >>>>>>> warning message. Previously panic is kept there to help our
> >>>>>>> debug in early stage. 
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>> Does the problem happen only on VTI?  Or both VTI 
> and non-VTI
> >>>>>>>>> on split-cache machines?
> >>>>>>>> 
> >>>>>>>> Sometimes, it makes domain0 crash at the very 
> beginning of the
> >>>>>>>> domain0 boot process, especially on MP machine.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Thanks
> >>>>>>>> -Anthony
> >>>>>>> 
> >>>>>>> One complement is, that problem definitely exists on new
> >>>>>>> split-cache processors, for dom0/domU. For VTI domain, we have
> >>>>>>> logic within device model to ensure consistence.
> >>>>>>> 
> >>>>>>> Thanks,
> >>>>>>> Kevin
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Magenheimer, Dan (HP Labs Fort Collins)
> >>>>>>>> [mailto:dan.magenheimer@xxxxxx]
> >>>>>>>>> Sent: 2005å12æ16æ 1:39
> >>>>>>>>> To: Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >>>>>>>>> Cc: Xu, Anthony
> >>>>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on 
> domU on tip?
> >>>>>>>>> 
> >>>>>>>>>>> Is this code fragment necessary for VTI to boot domU
> >>>>>>>>>>> or is it OK to remove?
> >>>>>>>>>> 
> >>>>>>>>>>    The comment is inaccurate and it should be 
> domU. That I/D
> >>>>>>>>>> cache sync step is mandatory to boot domU on new IA64
> >>>>>>>>>> processor which has split L2 I/D cache. If without such I/D
> >>>>>>>>>> cache sync, control panel loads domU's kernel image which
> >>>>>>>>>> only affects D side cache. If there're some stale entry on
> >>>>>>>>>> I-side cache within same range of dom0 image, people will
> >>>>>>>>>> see machine going weird.
> >>>>>>>>> 
> >>>>>>>>> I don't understand... how can there be stale entries in the
> >>>>>>>>> I-cache? The instructions have just been written to memory
> >>>>>>>>> (through D-cache) and no instructions in this 
> domain have yet
> >>>>>>>>> been executed. I do see that the D-cache needs to be flushed
> >>>>>>>>> so that memory is coherent but are there better ways to do
> >>>>>>>>> that without a pal call? 
> >>>>>>>>> 
> >>>>>>>>>>    Normally I/D cache sync shouldn't force any problem.
> >>>>>>>>>> Possibly there's some problem with the pal calling code,
> >>>>>>>>>> like incorrect ITLB mapping for pal or similar issue...
> >>>>>>>>> 
> >>>>>>>>> Although the ia64_pal_cache_flush routine is defined in
> >>>>>>>>> linux's pal.h, it doesn't appear to be used 
> anywhere in Linux
> >>>>>>>>> so there is no use model to copy.  I suspect there is some
> >>>>>>>>> use model for the call that we don't understand, for example
> >>>>>>>>> maybe it should only be called with physical &progress?  It
> >>>>>>>>> definitely fails every time on one of my (newer) 
> machines and
> >>>>>>>>> disabling the pal call makes the problem go away.
> >>>>>>>>> 
> >>>>>>>>>> Though it's intermittent, please
> >>>>>>>>>> keep this code
> >>>>>>>>>> there for correctness.
> >>>>>>>>> 
> >>>>>>>>> Since the call is definitely failing under some 
> circumstances
> >>>>>>>>> that we don't understand, I'm inclined to at least put the
> >>>>>>>>> code in an #ifdef CONFIG_SPLIT_CACHE
> >>>>>>>>> 
> >>>>>>>>> Does the problem happen only on VTI?  Or both VTI 
> and non-VTI
> >>>>>>>>> on split-cache machines? 
> >>>>>>>>> 
> >>>>>>>>> Thanks,
> >>>>>>>>> Dan
> >>>>>>>>> 
> >>>>>>>>> P.S. I tried Anthony's patch (which moves the PAL call after
> >>>>>>>>> new_thread()) but it still crashes.
> 
> 
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.