[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Re: [PATCH] blkfront: Move blkif_interrupt into a tasklet.
On 09/27/2010 11:46 AM, Daniel Stodden wrote: > On Mon, 2010-09-27 at 03:41 -0400, Andrew Jones wrote: >> On 09/24/2010 08:50 PM, Jeremy Fitzhardinge wrote: >>> On 09/24/2010 12:14 AM, Andrew Jones wrote: >>>> On 09/23/2010 08:36 PM, Jeremy Fitzhardinge wrote: >>>>> On 09/23/2010 09:38 AM, Paolo Bonzini wrote: >>>>>> On 09/23/2010 06:23 PM, Jeremy Fitzhardinge wrote: >>>>>>>> Any developments with this? I've got a report of the exact same >>>>>>>> warnings >>>>>>>> on RHEL6 guest. See >>>>>>>> >>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=632802 >>>>>>>> >>>>>>>> RHEL6 doesn't have the 'Move blkif_interrupt into a tasklet' patch, so >>>>>>>> that can be ruled out. Unfortunately I don't have this reproducing on a >>>>>>>> test machine, so it's difficult to debug. The report I have showed >>>>>>>> that >>>>>>>> in at least one case it occurred on boot up, right after initting the >>>>>>>> block device. I'm trying to get confirmation if that's always the case. >>>>>>>> >>>>>>>> Thanks in advance for any pointers you might have. >>>>>>> Yes, I see it even after reverting that change as well. However I only >>>>>>> see it on my domain with an XFS filesystem, but I haven't dug any deeper >>>>>>> to see if that's relevant. >>>>>>> >>>>>>> Do you know when this appeared? Is it recent? What changes are in the >>>>>>> rhel6 kernel in question? >>>>>> It's got pretty much everything in stable-2.6.32.x, up to the 16 patch >>>>>> blkfront series you posted last July. There are some RHEL-specific >>>>>> workarounds for PV-on-HVM, but for PV domains everything matches >>>>>> upstream. >>>>> Have you tried bisecting to see when this particular problem appeared? >>>>> It looks to me like something is accidentally re-enabling interrupts - >>>>> perhaps a stack overrun is corrupting the "flags" argument between a >>>>> spin_lock_irqsave()/restore pair. >>>>> >>>> Unfortunately I don't have a test machine where I can do a bisection >>>> (yet). I'm looking for one. I only have this one report so far, and it's >>>> on a production machine. >>> >>> The report says that its repeatedly killing the machine though? In my >>> testing, it seems to hit the warning once at boot, but is OK after that >>> (not that I'm doing anything very stressful on the domain). >>> >> >> It looks like the crash is from failing to read swap due to a bad page >> map. It's possibly another issue, but I wanted to try and clean this >> issue up first to see what happens. > > Uh oh. Sure this was a frontend crash? If you see it a again, a stack > trace to look at would be great. > Hi Daniel, You can take a look at this bug https://bugzilla.redhat.com/show_bug.cgi?id=632802 there's stacks for the swap issue in the comments and also this attached dmesg https://bugzilla.redhat.com/attachment.cgi?id=447789 Thanks, Drew > Thanks, > Daniel > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |