[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219 sge_count (-12) is out of range


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Jürgen Groß <jgross@xxxxxxxx>
  • From: Harshvardhan Jha <harshvardhan.j.jha@xxxxxxxxxx>
  • Date: Thu, 30 Jan 2025 10:57:17 +0530
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=B48FXnNjVoTmGaIivTnb6qANYjJR4exf89gkaEt+BJ4=; b=AugLG9B4ZU6jn+xwJi7OuoM6l7F0vtKSTIJJxtrpOAEeJZy0v1WlOFQk2YaYz516pVLDsEajc+8ZZBLor+YxaQfheDEmAYpCUYpeppLg5yhUMzHYjVf7q3jA9PdCLqnt/v37ekR9SUCTSO4blmmsE12H3RVqVo9pE16eChzK0NIqnr7uZ9VCTMHg97NWbttlFvd87ftJjAspEMrCQVi8143moSZuJeqi6ymyfHw/zK3lwJjBt/up6jiBJoqk5oKOkbV+SNSgE/emDgTBUlhJWPuDFSmbnzcDWu1ECwop5mXwcPQzIPr0VPICr9LuQEDsXnpWspa7DulWRDGM0q9bLw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=I0KWyX1B6JXWLh87aNHd7aIlITdAITFndN6hqZUNZqyYEBH0AetIVEYz1bB7BYcQ9uipHtehldANIyRaQpl55Lkf2MkYh1pcCgIcwQgJSpEcCH5ey6q2y1AuwwaPPG3xbpDldGMiruss72VLM6v3XEtywpqmt5dLr76xfpvKIoyrQp/pS3pwA1OuTT7oh4l37TVhr1IJX7mKpSmBWnr1g/Uz1DmWTPV9ifcrHJdosuWcv7NXeGnX/abFPOjm+nQyO6fBOnqEXKFO+Sf4emopNvaYFWABKD4n3lzqO9xWm9Gw2njweIKQpNiXqfUuBjHI1PLPy6WBYK9kvaEerEHXbg==
  • Cc: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>, Konrad Wilk <konrad.wilk@xxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, Harshit Mogalapalli <harshit.m.mogalapalli@xxxxxxxxxx>, stable@xxxxxxxxxxxxxxx
  • Delivery-date: Thu, 30 Jan 2025 05:27:45 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 30/01/25 3:31 AM, Stefano Stabellini wrote:
> On Wed, 29 Jan 2025, Jürgen Groß wrote:
>> On 29.01.25 19:35, Harshvardhan Jha wrote:
>>> On 29/01/25 4:52 PM, Juergen Gross wrote:
>>>> On 29.01.25 10:15, Harshvardhan Jha wrote:
>>>>> On 29/01/25 2:34 PM, Greg KH wrote:
>>>>>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>>>>>> Hi Greg,
>>>>>>>
>>>>>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>>>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>>>>>> Hi there,
>>>>>>>>>
>>>>>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> +stable
>>>>>>>>>>>
>>>>>>>>>>> There seems to be some formatting issues in my log output. I
>>>>>>>>>>> have
>>>>>>>>>>> attached it as a file.
>>>>>>>>>> Confused, what are you wanting us to do here in the stable
>>>>>>>>>> tree?
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> greg k-h
>>>>>>>>> Since, this is reproducible on 5.4.y I have added stable. The
>>>>>>>>> culprit
>>>>>>>>> commit which upon getting reverted fixes this issue is also
>>>>>>>>> present in
>>>>>>>>> 5.4.y stable.
>>>>>>>> What culprit commit?  I see no information here :(
>>>>>>>>
>>>>>>>> Remember, top-posting is evil...
>>>>>>> My apologies,
>>>>>>>
>>>>>>> The stable tag v5.4.289 seems to fail to boot with the following
>>>>>>> prompt in an infinite loop:
>>>>>>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
>>>>>>> 3273 sge_count (-12) is out of range. Range is:  0-256
>>>>>>>
>>>>>>> Reverting the following patch seems to fix the issue:
>>>>>>>
>>>>>>> stable-5.4      : v5.4.285             - 5df29a445f3a xen/swiotlb:
>>>>>>> add
>>>>>>> alignment check for dma buffers
>>>>>>>
>>>>>>> I tried changing swiotlb grub command line arguments but that didn't
>>>>>>> seem to help much unfortunately and the error was seen again.
>>>>>>>
>>>>>> Ok, can you submit this revert with the information about why it
>>>>>> should
>>>>>> not be included in the 5.4.y tree and cc: everyone involved and then
>>>>>> we
>>>>>> will be glad to queue it up.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> greg k-h
>>>>> This might be reproducible on other stable trees and mainline as well so
>>>>> we will get it fixed there and I will submit the necessary fix to stable
>>>>> when everything is sorted out on mainline.
>>>> Right. Just reverting my patch will trade one error with another one (the
>>>> one which triggered me to write the patch).
>>>>
>>>> There are two possible ways to fix the issue:
>>>>
>>>> - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
>>>> supported
>>>>    size, the megaraid_sas driver seems to effectively request 4MB)
>>> This seems relatively simpler to implement but I'm not sure whether it's
>>> the most optimal approach
>> Just making the static array larger used to hold the frame numbers for the
>> buffer seems to be a waste of memory for most configurations.
>>
>> I'm thinking of an allocated array using the max needed size (replace a
>> former buffer with a larger one if needed).
> You are referring to discontig_frames and MAX_CONTIG_ORDER in
> arch/x86/xen/mmu_pv.c, right? I am not super familiar with that code but
> it looks like a good way to go.

This rejected patch works on MAX_CONTIG_ORDER and doubles the buffer
size but that is undesirable in most situations:

https://lore.kernel.org/lkml/28947d4f-ab32-4a57-8dbb-e37fa4183a69@xxxxxxxx/t/

What needs to be done is the buffer size will only be doubled when needed.


Harshvardhan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.