[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 0/4] enhance lock debugging



On 07.08.19 20:11, Andrew Cooper wrote:
On 07/08/2019 15:31, Juergen Gross wrote:
While hunting a locking problem in my core scheduling series I have
added some debugging aids to spinlock handling making it easier to
find the root cause for the problem.

Making use of the already lock profiling and enhancing it a little
bit produces some really valuable diagnostic data e.g. when a NMI
watchdog is triggering a crash.

So I'm not sure where best to report this in the series, so 0/$N will
have to do.  Here is a sample trace from panic():

   (XEN) Xen lock profile info SHOW  (now = 49695791886 total = 49695791886)
   (XEN) Global grant_lock: addr=ffff82d080818640, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global balloon_lock: addr=ffff82d080818630, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global osvw_lock: addr=ffff82d080818610, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global mtrr_mutex: addr=ffff82d0808185f0, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global set_atomicity_lock: addr=ffff82d0808185e0, lockval=80008, 
cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global cmci_discover_lock: addr=ffff82d0808185c0, lockval=80008, 
cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global mce_logout_lock: addr=ffff82d0808185a0, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global processing_lock: addr=ffff82d080818580, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global vpmu_lock: addr=ffff82d080818560, lockval=40004, cpu=4095
   (XEN)   lock:           4(00000000:0000007D), block:            
0(00000000:00000000)
   (XEN) Global pm_lock: addr=ffff82d080818230, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global rtc_lock: addr=ffff82d080818210, lockval=310031, cpu=4095
   (XEN)   lock:          48(00000000:0001892E), block:            
0(00000000:00000000)
   (XEN) Global pit_lock: addr=ffff82d080818200, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global platform_timer_lock: addr=ffff82d0808181f0, lockval=6c006c, 
cpu=4095
   (XEN)   lock:          92(00000000:00012CAB), block:            
0(00000000:00000000)
   (XEN) Global sync_lock: addr=ffff82d0808181d0, lockval=c188c188, cpu=4095
   (XEN)   lock:      115080(00000000:0074F130), block:            
115079(00000000:0871000E)
   (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095
   (XEN)   lock:           1(00000000:01322165), block:            
0(00000000:00000000)
   (XEN) Global flush_lock: addr=ffff82d0808181c0, lockval=91139111, cpu=3
   (XEN)   lock:      168195(00000000:159C6F6E), block:            
1695(00000000:006B31FD)
   (XEN) Global xenpf_lock: addr=ffff82d0808181b0, lockval=2f002f, cpu=4095
   (XEN)   lock:          47(00000000:0001BC02), block:            
0(00000000:00000000)
   (XEN) Global pci_config_lock: addr=ffff82d0808181a0, lockval=374d374d, 
cpu=4095
   (XEN)   lock:        5196(00000000:004545AA), block:            
0(00000000:00000000)
   (XEN) Global lapic_nmi_owner_lock: addr=ffff82d080817d70, lockval=0000, 
cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global last_lock: addr=ffff82d080817d40, lockval=40034003, cpu=4095
   (XEN)   lock:       16387(00000000:00047132), block:            
0(00000000:00000000)
   (XEN) Global map_pgdir_lock: addr=ffff82d080817d50, lockval=2c002c, cpu=4095
   (XEN)   lock:          38(00000000:00000ACF), block:            
0(00000000:00000000)
   (XEN) Global microcode_mutex: addr=ffff82d080817d20, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global microcode_update_lock: addr=ffff82d080817d10, lockval=0000, 
cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global microcode_update_lock: addr=ffff82d080817d00, lockval=0000, 
cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global vector_lock: addr=ffff82d080817cf0, lockval=440044, cpu=4095
   (XEN)   lock:          27(00000000:00003B20), block:            
0(00000000:00000000)
   (XEN) Global irq_ratelimit_lock: addr=ffff82d080817ce0, lockval=0000, 
cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global msix_fixmap_lock: addr=ffff82d080817cb0, lockval=20002, cpu=4095
   (XEN)   lock:           2(00000000:000000CF), block:            
0(00000000:00000000)
   (XEN) Global ioapic_lock: addr=ffff82d080817c90, lockval=660066, cpu=4095
   (XEN)   lock:          20(00000000:00026E6C), block:            
0(00000000:00000000)
   (XEN) Global i8259A_lock: addr=ffff82d080817c80, lockval=2d002d, cpu=4095
   (XEN)   lock:           6(00000000:00001BA0), block:            
0(00000000:00000000)
   (XEN) Global apei_iomaps_lock: addr=ffff82d080817c50, lockval=110011, 
cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global erst_lock: addr=ffff82d080817c40, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global shared_intremap_lock: addr=ffff82d080817bc0, lockval=0000, 
cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global iommu_pt_cleanup_lock: addr=ffff82d080817ba0, lockval=0000, 
cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global rx_lock: addr=ffff82d080817b70, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global tx_lock: addr=ffff82d080817b60, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global console_lock: addr=ffff82d080817830, lockval=770077, cpu=4095
   (XEN)   lock:        3447(00000000:07595894), block:            
0(00000000:00000000)
   (XEN) Global ratelimit_lock: addr=ffff82d080817820, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global pool_list_lock: addr=ffff82d0808177e0, lockval=10001, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global xenoprof_lock: addr=ffff82d0808177d0, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global pmu_owner_lock: addr=ffff82d0808177c0, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global vm_lock: addr=ffff82d0808177b0, lockval=220022, cpu=4095
   (XEN)   lock:          17(00000000:00001ED1), block:            
0(00000000:00000000)
   (XEN) Global virtual_region_lock: addr=ffff82d0808177a0, lockval=10001, 
cpu=4095
   (XEN)   lock:           1(00000000:00000054), block:            
0(00000000:00000000)
   (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global wc_lock: addr=ffff82d080817770, lockval=c000c, cpu=4095
   (XEN)   lock:          11(00000000:00000995), block:            
0(00000000:00000000)
   (XEN) Global tasklet_lock: addr=ffff82d080817760, lockval=3f403f4, cpu=4095
   (XEN)   lock:         426(00000000:00007AFF), block:            
0(00000000:00000000)
   (XEN) Global sysctl_lock: addr=ffff82d080817750, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global symbols_mutex: addr=ffff82d080817740, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global stopmachine_lock: addr=ffff82d080817730, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global call_lock: addr=ffff82d080817710, lockval=4d004d, cpu=4095
   (XEN)   lock:          67(00000000:00176310), block:            
0(00000000:00000000)
   (XEN) Global heap_lock: addr=ffff82d080817570, lockval=6b286b28, cpu=4095
   (XEN)   lock:      753071(00000000:018E9B8B), block:            
33(00000000:0000916A)
   (XEN) Global payload_lock: addr=ffff82d080817550, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global crash_notes_lock: addr=ffff82d080817530, lockval=80008, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global global_virq_handlers_lock: addr=ffff82d080817500, lockval=0000, 
cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global domlist_update_lock: addr=ffff82d0808174e0, lockval=10001, 
cpu=4095
   (XEN)   lock:           1(00000000:0000005B), block:            
0(00000000:00000000)
   (XEN) Global domctl_lock: addr=ffff82d0808174d0, lockval=50005, cpu=4095
   (XEN)   lock:           5(00000000:000008DC), block:            
0(00000000:00000000)
   (XEN) Global cpupool_lock: addr=ffff82d0808174b0, lockval=a000a, cpu=4095
   (XEN)   lock:           1(00000000:00000176), block:            
0(00000000:00000000)
   (XEN) Global cpu_add_remove_lock: addr=ffff82d0808174a0, lockval=2d002d, 
cpu=4095
   (XEN)   lock:          24(00000000:0000175F), block:            
0(00000000:00000000)
   (XEN) Global efi_rs_lock: addr=ffff82d080817030, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Domain 0 event_lock: addr=ffff8308366c50e8, lockval=1d451d45, cpu=4095
   (XEN)   lock:        7493(00000000:00866052), block:            
0(00000000:00000000)
   (XEN) Domain 0 page_alloc_lock: addr=ffff8308366c5028, lockval=9aa59aa5, 
cpu=4095
   (XEN)   lock:       39589(00000000:0055339E), block:            
0(00000000:00000000)
   (XEN) Domain 0 domain_lock: addr=ffff8308366c5018, lockval=150015, cpu=4095
   (XEN)   lock:          21(00000000:00009D70), block:            
0(00000000:00000000)
   (XEN) Domain 32767 page_alloc_lock: addr=ffff83084ccc5028, lockval=0000, 
cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Domain 32767 domain_lock: addr=ffff83084ccc5018, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Domain 32753 page_alloc_lock: addr=ffff83084cccf028, lockval=3ee43ee4, 
cpu=4095
   (XEN)   lock:       16100(00000000:0005173F), block:            
0(00000000:00000000)
   (XEN) Domain 32753 domain_lock: addr=ffff83084cccf018, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Domain 32754 page_alloc_lock: addr=ffff83084ccd0028, lockval=64006400, 
cpu=4095
   (XEN)   lock:       25600(00000000:000814D9), block:            
0(00000000:00000000)
   (XEN) Domain 32754 domain_lock: addr=ffff83084ccd0018, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)

Its not exactly the easiest to dump to follow.

First of all - I don't see why the hold/block time are printed like that.  It
might be a holdover from the 32bit build, pre PRId64 support.  They should
probably use PRI_stime anyway.

Fine with me.

The domid rendering is unfortunate.  Ideally we'd use %pd but that would
involve rearranging the logic to get a struct domain* in hand.  Seeing as
you're the last person to modify this code, how hard would that be to do?

It would completely break the struct type agnostic design.

I have a debug patch adding credit2 run-queue lock. It requires to just
add 5 lines of code (and this includes moving the spinlock_init() out of
an irq-off section). It will produce:

(XEN) credit2-runq 0 lock: addr=ffff830413351010, lockval=de00ddf, cpu=6
(XEN)   lock: 896287(00000000:00FAA6B2), block:  819(00000000:00009C80)

If/when the per-domain locks are rendered more clearly, the "Global " prefix
can be dropped.

I'm not sure we should do that.

An interesting quirk of ticketlocks is that lockval= is redundant with
lock_cnt.  Could we perhaps fold the two fields?

No. See my credit2 example. As soon as we get above 65535 lockings the
values are no longer redundant.

Distinguishing cpu=4095 as "not locked" would be useful.

Yes, will change.

We have several global locks called lock:

   (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095
   (XEN)   lock:           1(00000000:01322165), block:            
0(00000000:00000000)
   (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)
   (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095
   (XEN)   lock:           0(00000000:00000000), block:            
0(00000000:00000000)

The second one in particular has corrupt data, seeing has it has been taken
and released several times without lock_cnt increasing.

The lock might have been taken/released before lock profiling has been
initialized.

For sanity sake, we should enforce unique naming of any lock registered for
profiling.

This would be every lock inited via DEFINE_SPINLOCK(). I can do a
followup patch for that purpose.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.