[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy_batch



On 01/07/2025 23:53, Abinash wrote:
> Hi ,
>
> Thanks for pointing that out.
>
> I haven’t measured the performance impact yet — my main focus was on
> getting rid of the stack usage warning triggered by LLVM due to
> inlining. But you're right, gntdev_ioctl_grant_copy() is on a hot
> path, and calling kmalloc() there could definitely slow things down,
> especially under memory pressure.
>
> I’ll run some benchmarks to compare the current approach with the
> dynamic allocation, and also look into alternatives — maybe
> pre-allocating the struct or limiting inlining instead. If you have
> any ideas or suggestions on how best to approach this, I’d be happy to
> hear them.
>
> Do you have any suggestions on how to test the performance?
>
> Best,
> Abinash
>
>

Preallocating may work but I'd be wary of synchronization if the
preallocated struct is shared.

I'd look at optimizing status[] which should save quite a few bytes.

Reducing GNTDEV_COPY_BATCH could be a last resort, but that may also
impact performance.

As for benchmarks, I think you can use iperf and fio with varying packet
sizes/block sizes.

> On Mon, 30 Jun 2025 at 16:05, Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx> wrote:
>>
>> Hi,
>>
>> On 30/06/2025 06:54, Abinash Singh wrote:
>>> While building the kernel with LLVM, a warning was reported due to
>>> excessive stack usage in `gntdev_ioctl`:
>>>
>>>        drivers/xen/gntdev.c:991: warning: stack frame size (1160) exceeds 
>>> limit (1024) in function 'gntdev_ioctl'
>>>
>>> Further analysis revealed that the large stack frame was caused by
>>> `struct gntdev_copy_batch`, which was declared on the stack inside
>>> `gntdev_ioctl_grant_copy()`. Since this function was inlined into
>>> `gntdev_ioctl`, the stack usage was attributed to the latter.
>>>
>>> This patch fixes the issue by dynamically allocating `gntdev_copy_batch`
>>> using `kmalloc()`, which significantly reduces the stack footprint and
>>> eliminates the warning.
>>>
>>> This approach is consistent with similar fixes upstream, such as:
>>>
>>> commit fa26198d30f3 ("iommu/io-pgtable-arm: dynamically allocate selftest 
>>> device struct")
>>>
>>> Fixes: a4cdb556cae0 ("xen/gntdev: add ioctl for grant copy")
>>> Signed-off-by: Abinash Singh <abinashsinghlalotra@xxxxxxxxx>
>>> ---
>>> The stack usage was confirmed using the `-fstack-usage`  flag and mannual 
>>> analysis, which showed:
>>>
>>>     drivers/xen/gntdev.c:953: gntdev_ioctl_grant_copy.isra   1048 bytes
>>>     drivers/xen/gntdev.c:826: gntdev_copy                     56 bytes
>>>
>>> Since `gntdev_ioctl` was calling the inlined `gntdev_ioctl_grant_copy`, the 
>>> total
>>> frame size exceeded 1024 bytes, triggering the warning.
>>>
>>> This patch addresses the warning and keeps stack usage within acceptable 
>>> limits.
>>> Is this patch fine or kmalloc may affect performance ?
>>> ---
>>
>> Have you measured the performance impact? gntdev_ioctl_grant_copy is
>> called quite often especially by the backend. I'm afraid calling the
>> allocator here may cause even more slowdown than there already is,
>> especially when memory is tight.
>>
>>>    drivers/xen/gntdev.c | 24 +++++++++++++++---------
>>>    1 file changed, 15 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>>> index 61faea1f0663..9811f3d7c87c 100644
>>> --- a/drivers/xen/gntdev.c
>>> +++ b/drivers/xen/gntdev.c
>>> @@ -953,15 +953,19 @@ static int gntdev_grant_copy_seg(struct 
>>> gntdev_copy_batch *batch,
>>>    static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void 
>>> __user *u)
>>>    {
>>>        struct ioctl_gntdev_grant_copy copy;
>>> -     struct gntdev_copy_batch batch;
>>> +     struct gntdev_copy_batch *batch;
>>>        unsigned int i;
>>>        int ret = 0;
>>>
>>>        if (copy_from_user(&copy, u, sizeof(copy)))
>>>                return -EFAULT;
>>> -
>>> -     batch.nr_ops = 0;
>>> -     batch.nr_pages = 0;
>>> +
>>> +     batch = kmalloc(sizeof(*batch), GFP_KERNEL);
>>> +     if (!batch)
>>> +             return -ENOMEM;
>>> +
>>> +     batch->nr_ops = 0;
>>> +     batch->nr_pages = 0;
>>>
>>>        for (i = 0; i < copy.count; i++) {
>>>                struct gntdev_grant_copy_segment seg;
>>> @@ -971,18 +975,20 @@ static long gntdev_ioctl_grant_copy(struct 
>>> gntdev_priv *priv, void __user *u)
>>>                        goto out;
>>>                }
>>>
>>> -             ret = gntdev_grant_copy_seg(&batch, &seg, 
>>> &copy.segments[i].status);
>>> +             ret = gntdev_grant_copy_seg(batch, &seg, 
>>> &copy.segments[i].status);
>>>                if (ret < 0)
>>>                        goto out;
>>>
>>>                cond_resched();
>>>        }
>>> -     if (batch.nr_ops)
>>> -             ret = gntdev_copy(&batch);
>>> -     return ret;
>>> +     if (batch->nr_ops)
>>> +             ret = gntdev_copy(batch);
>>> +     goto free_batch;
>>>
>>>      out:
>>> -     gntdev_put_pages(&batch);
>>> +     gntdev_put_pages(batch);
>>> +  free_batch:
>>> +     kfree(batch);
>>>        return ret;
>>>    }
>>>
>>
>>
>>
>> Ngoc Tu Dinh | Vates XCP-ng Developer
>>
>> XCP-ng & Xen Orchestra - Vates solutions
>>
>> web: https://vates.tech
>>



Ngoc Tu Dinh | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.