[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy_batch
On 01/07/2025 23:53, Abinash wrote: > Hi , > > Thanks for pointing that out. > > I haven’t measured the performance impact yet — my main focus was on > getting rid of the stack usage warning triggered by LLVM due to > inlining. But you're right, gntdev_ioctl_grant_copy() is on a hot > path, and calling kmalloc() there could definitely slow things down, > especially under memory pressure. > > I’ll run some benchmarks to compare the current approach with the > dynamic allocation, and also look into alternatives — maybe > pre-allocating the struct or limiting inlining instead. If you have > any ideas or suggestions on how best to approach this, I’d be happy to > hear them. > > Do you have any suggestions on how to test the performance? > > Best, > Abinash > > Preallocating may work but I'd be wary of synchronization if the preallocated struct is shared. I'd look at optimizing status[] which should save quite a few bytes. Reducing GNTDEV_COPY_BATCH could be a last resort, but that may also impact performance. As for benchmarks, I think you can use iperf and fio with varying packet sizes/block sizes. > On Mon, 30 Jun 2025 at 16:05, Tu Dinh <ngoc-tu.dinh@xxxxxxxxxx> wrote: >> >> Hi, >> >> On 30/06/2025 06:54, Abinash Singh wrote: >>> While building the kernel with LLVM, a warning was reported due to >>> excessive stack usage in `gntdev_ioctl`: >>> >>> drivers/xen/gntdev.c:991: warning: stack frame size (1160) exceeds >>> limit (1024) in function 'gntdev_ioctl' >>> >>> Further analysis revealed that the large stack frame was caused by >>> `struct gntdev_copy_batch`, which was declared on the stack inside >>> `gntdev_ioctl_grant_copy()`. Since this function was inlined into >>> `gntdev_ioctl`, the stack usage was attributed to the latter. >>> >>> This patch fixes the issue by dynamically allocating `gntdev_copy_batch` >>> using `kmalloc()`, which significantly reduces the stack footprint and >>> eliminates the warning. >>> >>> This approach is consistent with similar fixes upstream, such as: >>> >>> commit fa26198d30f3 ("iommu/io-pgtable-arm: dynamically allocate selftest >>> device struct") >>> >>> Fixes: a4cdb556cae0 ("xen/gntdev: add ioctl for grant copy") >>> Signed-off-by: Abinash Singh <abinashsinghlalotra@xxxxxxxxx> >>> --- >>> The stack usage was confirmed using the `-fstack-usage` flag and mannual >>> analysis, which showed: >>> >>> drivers/xen/gntdev.c:953: gntdev_ioctl_grant_copy.isra 1048 bytes >>> drivers/xen/gntdev.c:826: gntdev_copy 56 bytes >>> >>> Since `gntdev_ioctl` was calling the inlined `gntdev_ioctl_grant_copy`, the >>> total >>> frame size exceeded 1024 bytes, triggering the warning. >>> >>> This patch addresses the warning and keeps stack usage within acceptable >>> limits. >>> Is this patch fine or kmalloc may affect performance ? >>> --- >> >> Have you measured the performance impact? gntdev_ioctl_grant_copy is >> called quite often especially by the backend. I'm afraid calling the >> allocator here may cause even more slowdown than there already is, >> especially when memory is tight. >> >>> drivers/xen/gntdev.c | 24 +++++++++++++++--------- >>> 1 file changed, 15 insertions(+), 9 deletions(-) >>> >>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c >>> index 61faea1f0663..9811f3d7c87c 100644 >>> --- a/drivers/xen/gntdev.c >>> +++ b/drivers/xen/gntdev.c >>> @@ -953,15 +953,19 @@ static int gntdev_grant_copy_seg(struct >>> gntdev_copy_batch *batch, >>> static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void >>> __user *u) >>> { >>> struct ioctl_gntdev_grant_copy copy; >>> - struct gntdev_copy_batch batch; >>> + struct gntdev_copy_batch *batch; >>> unsigned int i; >>> int ret = 0; >>> >>> if (copy_from_user(©, u, sizeof(copy))) >>> return -EFAULT; >>> - >>> - batch.nr_ops = 0; >>> - batch.nr_pages = 0; >>> + >>> + batch = kmalloc(sizeof(*batch), GFP_KERNEL); >>> + if (!batch) >>> + return -ENOMEM; >>> + >>> + batch->nr_ops = 0; >>> + batch->nr_pages = 0; >>> >>> for (i = 0; i < copy.count; i++) { >>> struct gntdev_grant_copy_segment seg; >>> @@ -971,18 +975,20 @@ static long gntdev_ioctl_grant_copy(struct >>> gntdev_priv *priv, void __user *u) >>> goto out; >>> } >>> >>> - ret = gntdev_grant_copy_seg(&batch, &seg, >>> ©.segments[i].status); >>> + ret = gntdev_grant_copy_seg(batch, &seg, >>> ©.segments[i].status); >>> if (ret < 0) >>> goto out; >>> >>> cond_resched(); >>> } >>> - if (batch.nr_ops) >>> - ret = gntdev_copy(&batch); >>> - return ret; >>> + if (batch->nr_ops) >>> + ret = gntdev_copy(batch); >>> + goto free_batch; >>> >>> out: >>> - gntdev_put_pages(&batch); >>> + gntdev_put_pages(batch); >>> + free_batch: >>> + kfree(batch); >>> return ret; >>> } >>> >> >> >> >> Ngoc Tu Dinh | Vates XCP-ng Developer >> >> XCP-ng & Xen Orchestra - Vates solutions >> >> web: https://vates.tech >> Ngoc Tu Dinh | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |