[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-ia64-devel] Console problem on domU on tip?


  • To: "Magenheimer, Dan \(HP Labs Fort Collins\)" <dan.magenheimer@xxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>, <xen-ia64-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Xu, Anthony" <anthony.xu@xxxxxxxxx>
  • Date: Fri, 16 Dec 2005 09:53:40 +0800
  • Delivery-date: Fri, 16 Dec 2005 01:55:31 +0000
  • List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
  • Thread-index: AcYBELBGMu7ZHSeYRUaEav49mCDvgwAAVFMwAADLrDAAIbQEoAARHDxg
  • Thread-topic: [Xen-ia64-devel] Console problem on domU on tip?

>I don't understand... how can there be stale entries in the I-cache?
>The instructions have just been written to memory (through D-cache)
>and no instructions in this domain have yet been executed.
>I do see that the D-cache needs to be flushed so that memory is
>coherent but are there better ways to do that without a pal call?

We agree on D-cache needs to be flushed, as for I-cache, if we run multiple 
domains,  destroying and creating domains will free and allocate memory , so 
when we create a new domain, the memory allocated may be used before and 
I-cache is still indexed by physical address, So there maybe some stale entries 
in the I-cache. 

>Although the ia64_pal_cache_flush routine is defined in linux's pal.h,
>it doesn't appear to be used anywhere in Linux so there is no use
>model to copy.  I suspect there is some use model for the call that
>we don't understand, for example maybe it should only be called with
>physical &progress?  It definitely fails every time on one of
>my (newer) machines and disabling the pal call makes the problem
>go away.

        /* Sync d/i cache conservatively */
        if (!running_on_sim) {
            ret = ia64_pal_cache_flush(4, 0, &progress, NULL);
            if (ret != PAL_STATUS_SUCCESS)
                panic("PAL CACHE FLUSH failed for dom0.\n");
            printk("Sync i/d cache for dom0 image SUCC\n");
        }

I think maybe the firmware on your machine is old and doesn't support this pal 
call, Could you change this patch as below and take a try see whether this 
works?

        /* Sync d/i cache conservatively */
        if (!running_on_sim) {
            ret = ia64_pal_cache_flush(4, 0, &progress, NULL);
            if ((ret != PAL_STATUS_SUCCESS)&&(ret != PAL_STATUS_UNIMPLEMENTED))
                panic("PAL CACHE FLUSH failed for dom0.\n");
            printk("Sync i/d cache for dom0 image SUCC\n");
        }

>Does the problem happen only on VTI?  Or both VTI and non-VTI on
>split-cache machines?

Sometimes, it makes domain0 crash at the very beginning of the domain0 boot 
process, especially on MP machine. 


Thanks
-Anthony


>-----Original Message-----
>From: Magenheimer, Dan (HP Labs Fort Collins) [mailto:dan.magenheimer@xxxxxx]
>Sent: 2005年12月16日 1:39
>To: Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>Cc: Xu, Anthony
>Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
>
>> >Is this code fragment necessary for VTI to boot domU
>> >or is it OK to remove?
>>
>>      The comment is inaccurate and it should be domU. That I/D cache
>> sync step is mandatory to boot domU on new IA64 processor which has
>> split L2 I/D cache. If without such I/D cache sync, control
>> panel loads
>> domU's kernel image which only affects D side cache. If there're some
>> stale entry on I-side cache within same range of dom0 image,
>> people will
>> see machine going weird.
>
>I don't understand... how can there be stale entries in the I-cache?
>The instructions have just been written to memory (through D-cache)
>and no instructions in this domain have yet been executed.
>I do see that the D-cache needs to be flushed so that memory is
>coherent but are there better ways to do that without a pal call?
>
>>      Normally I/D cache sync shouldn't force any problem. Possibly
>> there's some problem with the pal calling code, like incorrect ITLB
>> mapping for pal or similar issue...
>
>Although the ia64_pal_cache_flush routine is defined in linux's pal.h,
>it doesn't appear to be used anywhere in Linux so there is no use
>model to copy.  I suspect there is some use model for the call that
>we don't understand, for example maybe it should only be called with
>physical &progress?  It definitely fails every time on one of
>my (newer) machines and disabling the pal call makes the problem
>go away.
>
>Also, why panic if it fails?
>
>> Though it's intermittent, please
>> keep this code
>> there for correctness.
>
>Since the call is definitely failing under some circumstances
>that we don't understand, I'm inclined to at least put the code
>in an #ifdef CONFIG_SPLIT_CACHE
>
>Does the problem happen only on VTI?  Or both VTI and non-VTI on
>split-cache machines?
>
>Thanks,
>Dan
>
>P.S. I tried Anthony's patch (which moves the PAL call after
>new_thread()) but it still crashes.

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.