[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [PATCH] blkfront: Move blkif_interrupt into a tasklet.



On 09/24/2010 08:50 PM, Jeremy Fitzhardinge wrote:
>  On 09/24/2010 12:14 AM, Andrew Jones wrote:
>> On 09/23/2010 08:36 PM, Jeremy Fitzhardinge wrote:
>>>  On 09/23/2010 09:38 AM, Paolo Bonzini wrote:
>>>> On 09/23/2010 06:23 PM, Jeremy Fitzhardinge wrote:
>>>>>> Any developments with this? I've got a report of the exact same
>>>>>> warnings
>>>>>> on RHEL6 guest. See
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=632802
>>>>>>
>>>>>> RHEL6 doesn't have the 'Move blkif_interrupt into a tasklet' patch, so
>>>>>> that can be ruled out. Unfortunately I don't have this reproducing on a
>>>>>> test machine, so it's difficult to debug.  The report I have showed
>>>>>> that
>>>>>> in at least one case it occurred on boot up, right after initting the
>>>>>> block device. I'm trying to get confirmation if that's always the case.
>>>>>>
>>>>>> Thanks in advance for any pointers you might have.
>>>>> Yes, I see it even after reverting that change as well.  However I only
>>>>> see it on my domain with an XFS filesystem, but I haven't dug any deeper
>>>>> to see if that's relevant.
>>>>>
>>>>> Do you know when this appeared?  Is it recent?  What changes are in the
>>>>> rhel6 kernel in question?
>>>> It's got pretty much everything in stable-2.6.32.x, up to the 16 patch
>>>> blkfront series you posted last July.  There are some RHEL-specific
>>>> workarounds for PV-on-HVM, but for PV domains everything matches
>>>> upstream.
>>> Have you tried bisecting to see when this particular problem appeared? 
>>> It looks to me like something is accidentally re-enabling interrupts -
>>> perhaps a stack overrun is corrupting the "flags" argument between a
>>> spin_lock_irqsave()/restore pair. 
>>>
>> Unfortunately I don't have a test machine where I can do a bisection
>> (yet). I'm looking for one. I only have this one report so far, and it's
>> on a production machine.
> 
> The report says that its repeatedly killing the machine though?  In my
> testing, it seems to hit the warning once at boot, but is OK after that
> (not that I'm doing anything very stressful on the domain).
> 

It looks like the crash is from failing to read swap due to a bad page
map. It's possibly another issue, but I wanted to try and clean this
issue up first to see what happens.

>>> Is it only on 32-bit kernels?
>>>
>> This one report I have is a 32b guest on a 64b host.
> 
> Is it using XFS by any chance?  So far I've traced the re-enable to
> xfs_buf_bio_end_io().  However, my suspicion is that it might be related
> to the barrier changes we did.
> 

I'll check on the xfs and let you know.

>     J
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.