[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC PATCH 00/30] Code tagging framework and applications



===========================
Code tagging framework
===========================
Code tag is a structure identifying a specific location in the source code
which is generated at compile time and can be embedded in an application-
specific structure. Several applications of code tagging are included in
this RFC, such as memory allocation tracking, dynamic fault injection,
latency tracking and improved error code reporting.
Basically, it takes the old trick of "define a special elf section for
objects of a given type so that we can iterate over them at runtime" and
creates a proper library for it.

===========================
Memory allocation tracking
===========================
The goal for using codetags for memory allocation tracking is to minimize
performance and memory overhead. By recording only the call count and
allocation size, the required operations are kept at the minimum while
collecting statistics for every allocation in the codebase. With that
information, if users are interested in mode detailed context for a
specific allocation, they can enable more in-depth context tracking,
which includes capturing the pid, tgid, task name, allocation size,
timestamp and call stack for every allocation at the specified code
location.
Memory allocation tracking is implemented in two parts:

part1: instruments page and slab allocators to record call count and total
memory allocated at every allocation in the source code. Every time an
allocation is performed by an instrumented allocator, the codetag at that
location increments its call and size counters. Every time the memory is
freed these counters are decremented. To decrement the counters upon free,
allocated object needs a reference to its codetag. Page allocators use
page_ext to record this reference while slab allocators use memcg_data of
the slab page.
The data is exposed to the user space via a read-only debugfs file called
alloc_tags.

Usage example:

$ sort -hr /sys/kernel/debug/alloc_tags|head
  153MiB     8599 mm/slub.c:1826 module:slub func:alloc_slab_page
 6.08MiB      49 mm/slab_common.c:950 module:slab_common func:_kmalloc_order
 5.09MiB     6335 mm/memcontrol.c:2814 module:memcontrol 
func:alloc_slab_obj_exts
 4.54MiB      78 mm/page_alloc.c:5777 module:page_alloc func:alloc_pages_exact
 1.32MiB      338 include/asm-generic/pgalloc.h:63 module:pgtable 
func:__pte_alloc_one
 1.16MiB      603 fs/xfs/xfs_log_priv.h:700 module:xfs func:xlog_kvmalloc
 1.00MiB      256 mm/swap_cgroup.c:48 module:swap_cgroup 
func:swap_cgroup_prepare
  734KiB     5380 fs/xfs/kmem.c:20 module:xfs func:kmem_alloc
  640KiB      160 kernel/rcu/tree.c:3184 module:tree func:fill_page_cache_func
  640KiB      160 drivers/char/virtio_console.c:452 module:virtio_console 
func:alloc_buf

part2: adds support for the user to select a specific code location to capture
allocation context. A new debugfs file called alloc_tags.ctx is used to select
which code location should capture allocation context and to read captured
context information.

Usage example:

$ cd /sys/kernel/debug/
$ echo "file include/asm-generic/pgalloc.h line 63 enable" > alloc_tags.ctx
$ cat alloc_tags.ctx
  920KiB      230 include/asm-generic/pgalloc.h:63 module:pgtable 
func:__pte_alloc_one
    size: 4096
    pid: 1474
    tgid: 1474
    comm: bash
    ts: 175332940994
    call stack:
         pte_alloc_one+0xfe/0x130
         __pte_alloc+0x22/0xb0
         copy_page_range+0x842/0x1640
         dup_mm+0x42d/0x580
         copy_process+0xfb1/0x1ac0
         kernel_clone+0x92/0x3e0
         __do_sys_clone+0x66/0x90
         do_syscall_64+0x38/0x90
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
...

NOTE: slab allocation tracking is not yet stable and has a leak that
shows up in long-running tests. We are working on fixing it and posting
the RFC early to collect some feedback and to have a reference code in
public before presenting the idea at LPC2022.

===========================
Dynamic fault injection
===========================
Dynamic fault injection lets you do fault injection with a single call
to dynamic_fault(), with a debugfs interface similar to dynamic_debug.

Calls to dynamic_fault are listed in debugfs and can be enabled at
runtime (oneshot mode or a defined frequency are also available). This
patch also uses the memory allocation wrapper macros introduced by the
memory allocation tracking patches to add distinct fault injection
points for every memory allocation in the kernel.

Example fault injection points, after hooking memory allocation paths:

  fs/xfs/libxfs/xfs_iext_tree.c:606 module:xfs 
func:xfs_iext_realloc_rootclass:memory disabled "
  fs/xfs/libxfs/xfs_inode_fork.c:503 module:xfs 
func:xfs_idata_reallocclass:memory disabled "
  fs/xfs/libxfs/xfs_inode_fork.c:399 module:xfs 
func:xfs_iroot_reallocclass:memory disabled "
  fs/xfs/xfs_buf.c:373 module:xfs func:xfs_buf_alloc_pagesclass:memory disabled 
"
  fs/xfs/xfs_iops.c:497 module:xfs func:xfs_vn_get_linkclass:memory disabled "
  fs/xfs/xfs_mount.c:85 module:xfs func:xfs_uuid_mountclass:memory disabled "

===========================
Latency tracking
===========================
This lets you instrument code for measuring latency with just two calls
to code_tag_time_stats_start() and code_tag_time_stats_finish(), and
makes statistics available in debugfs on a per-callsite basis.

Recorded statistics include total count, frequency/rate, average
duration, max duration, and event duration quantiles.

Additionally, this patch instruments prepare_to_wait() and finish_wait().

Example output:

  fs/xfs/xfs_extent_busy.c:589 module:xfs func:xfs_extent_busy_flush
  count:          61
  rate:           0/sec
  frequency:    19 sec
  avg duration:   632 us
  max duration:   2 ms
  quantiles (us): 274 288 288 296 296 296 296 336 336 336 336 336 336 336 336

===========================
Improved error codes
===========================
Ever waste hours trying to figure out which line of code from some
obscure module is returning you -EINVAL and nothing else?

What if we had... more error codes?

This patch adds ERR(), which returns a unique error code that is related
to the error code that passed to it: the original error code can be
recovered with error_class(), and errname() (as well as %pE) returns an
error string that includes the file and line number of the ERR() call.

Example output:

  VFS: Cannot open root device "sda" or unknown-block(8,0): error -EINVAL at 
fs/ext4/super.c:4387

===========================
Dynamic debug conversion to code tagging
===========================
There are several open coded implementations of the "define a special elf
section for objects and iterate" technique that should be converted to
code tagging. This series just converts dynamic debug; there are others
(multiple in ftrace, in particular) that should also be converted.

===========================

The patchset applies cleanly over Linux 6.0-rc3
The tree for testing is published at:
https://github.com/surenbaghdasaryan/linux/tree/alloc_tags_rfc

The structure of the patchset is:
- code tagging framework (patches 1-6)
- page allocation tracking (patches 7-10)
- slab allocation tracking (patch 11-16)
- allocation context capture (patch 17-21)
- dynamic fault injection (patch 22)
- latency tracking (patch 23-27)
- improved error codes (patch 28)
- dynamic debug conversion to code tagging (patch 29)
- MAINTAINERS update (patch 30)

Next steps:
- track and fix slab allocator leak mentioned earlier;
- instrument more allocators: vmalloc, per-cpu allocations, others?


Kent Overstreet (14):
  lib/string_helpers: Drop space in string_get_size's output
  Lazy percpu counters
  scripts/kallysms: Always include __start and __stop symbols
  lib/string.c: strsep_no_empty()
  codetag: add codetag query helper functions
  Code tagging based fault injection
  timekeeping: Add a missing include
  wait: Clean up waitqueue_entry initialization
  lib/time_stats: New library for statistics on events
  bcache: Convert to lib/time_stats
  Code tagging based latency tracking
  Improved symbolic error names
  dyndbg: Convert to code tagging
  MAINTAINERS: Add entries for code tagging & related

Suren Baghdasaryan (16):
  kernel/module: move find_kallsyms_symbol_value declaration
  lib: code tagging framework
  lib: code tagging module support
  lib: add support for allocation tagging
  lib: introduce page allocation tagging
  change alloc_pages name in dma_map_ops to avoid name conflicts
  mm: enable page allocation tagging for __get_free_pages and
    alloc_pages
  mm: introduce slabobj_ext to support slab object extensions
  mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext
    creation
  mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation
  mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache
    objects
  lib: introduce slab allocation tagging
  mm: enable slab allocation tagging for kmalloc and friends
  move stack capture functionality into a separate function for reuse
  lib: introduce support for storing code tag context
  lib: implement context capture support for page and slab allocators

 MAINTAINERS                         |  34 ++
 arch/x86/kernel/amd_gart_64.c       |   2 +-
 drivers/iommu/dma-iommu.c           |   2 +-
 drivers/md/bcache/Kconfig           |   1 +
 drivers/md/bcache/bcache.h          |   1 +
 drivers/md/bcache/bset.c            |   8 +-
 drivers/md/bcache/bset.h            |   1 +
 drivers/md/bcache/btree.c           |  12 +-
 drivers/md/bcache/super.c           |   3 +
 drivers/md/bcache/sysfs.c           |  43 ++-
 drivers/md/bcache/util.c            |  30 --
 drivers/md/bcache/util.h            |  57 ---
 drivers/xen/grant-dma-ops.c         |   2 +-
 drivers/xen/swiotlb-xen.c           |   2 +-
 include/asm-generic/codetag.lds.h   |  18 +
 include/asm-generic/vmlinux.lds.h   |   8 +-
 include/linux/alloc_tag.h           |  84 +++++
 include/linux/codetag.h             | 159 +++++++++
 include/linux/codetag_ctx.h         |  48 +++
 include/linux/codetag_time_stats.h  |  54 +++
 include/linux/dma-map-ops.h         |   2 +-
 include/linux/dynamic_debug.h       |  11 +-
 include/linux/dynamic_fault.h       |  79 +++++
 include/linux/err.h                 |   2 +-
 include/linux/errname.h             |  50 +++
 include/linux/gfp.h                 |  10 +-
 include/linux/gfp_types.h           |  12 +-
 include/linux/io_uring_types.h      |   2 +-
 include/linux/lazy-percpu-counter.h |  67 ++++
 include/linux/memcontrol.h          |  23 +-
 include/linux/module.h              |   1 +
 include/linux/page_ext.h            |   3 +-
 include/linux/pgalloc_tag.h         |  63 ++++
 include/linux/sbitmap.h             |   6 +-
 include/linux/sched.h               |   6 +-
 include/linux/slab.h                | 136 +++++---
 include/linux/slab_def.h            |   2 +-
 include/linux/slub_def.h            |   4 +-
 include/linux/stackdepot.h          |   3 +
 include/linux/string.h              |   1 +
 include/linux/time_stats.h          |  44 +++
 include/linux/timekeeping.h         |   1 +
 include/linux/wait.h                |  72 ++--
 include/linux/wait_bit.h            |   7 +-
 init/Kconfig                        |   5 +
 kernel/dma/mapping.c                |   4 +-
 kernel/module/internal.h            |   3 -
 kernel/module/main.c                |  27 +-
 kernel/sched/wait.c                 |  15 +-
 lib/Kconfig                         |   6 +
 lib/Kconfig.debug                   |  46 +++
 lib/Makefile                        |  10 +
 lib/alloc_tag.c                     | 391 +++++++++++++++++++++
 lib/codetag.c                       | 519 ++++++++++++++++++++++++++++
 lib/codetag_time_stats.c            | 143 ++++++++
 lib/dynamic_debug.c                 | 452 +++++++++---------------
 lib/dynamic_fault.c                 | 372 ++++++++++++++++++++
 lib/errname.c                       | 103 ++++++
 lib/lazy-percpu-counter.c           | 141 ++++++++
 lib/pgalloc_tag.c                   |  22 ++
 lib/stackdepot.c                    |  68 ++++
 lib/string.c                        |  19 +
 lib/string_helpers.c                |   3 +-
 lib/time_stats.c                    | 236 +++++++++++++
 mm/kfence/core.c                    |   2 +-
 mm/memcontrol.c                     |  62 ++--
 mm/mempolicy.c                      |   4 +-
 mm/page_alloc.c                     |  13 +-
 mm/page_ext.c                       |   6 +
 mm/page_owner.c                     |  54 +--
 mm/slab.c                           |   4 +-
 mm/slab.h                           | 125 ++++---
 mm/slab_common.c                    |  49 ++-
 mm/slob.c                           |   2 +
 mm/slub.c                           |   7 +-
 scripts/kallsyms.c                  |  13 +
 scripts/module.lds.S                |   7 +
 77 files changed, 3406 insertions(+), 703 deletions(-)
 create mode 100644 include/asm-generic/codetag.lds.h
 create mode 100644 include/linux/alloc_tag.h
 create mode 100644 include/linux/codetag.h
 create mode 100644 include/linux/codetag_ctx.h
 create mode 100644 include/linux/codetag_time_stats.h
 create mode 100644 include/linux/dynamic_fault.h
 create mode 100644 include/linux/lazy-percpu-counter.h
 create mode 100644 include/linux/pgalloc_tag.h
 create mode 100644 include/linux/time_stats.h
 create mode 100644 lib/alloc_tag.c
 create mode 100644 lib/codetag.c
 create mode 100644 lib/codetag_time_stats.c
 create mode 100644 lib/dynamic_fault.c
 create mode 100644 lib/lazy-percpu-counter.c
 create mode 100644 lib/pgalloc_tag.c
 create mode 100644 lib/time_stats.c

-- 
2.37.2.672.g94769d06f0-goog




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.