[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Wg-test-framework] osstest Massachusetts test lab resource usage investigation
Ian Jackson writes ("[Wg-test-framework] osstest Massachusetts test lab resource usage investigation"): > In parallel with the work I am reporting in this email, I am already > taking steps to investigate the performance of the various database > queries, and the situation with locking. > > I hope that this will produce actionable ideas for improvement. I have now completed the investigation into db queries. The patch for the code I used is below. The output (for data collected for the past few days, which I expect to be fairly representative) is here: http://xenbits.xen.org/people/iwj/2016/mg-db-activity.2016-08-30.txt Best to download it and view it in less -S. Again, interpreting these numbers will depend on reading the source code to see what the various categories mean. I didn't bother with charts because there are many fewer degrees of freedom here. The bulk of the delay is caused by connections which have done LOCK TABLE but are now doing some processing in the client. Originally, this was necessary because the version of Postgres we were using didn't support predicate locking; so we would simply lock all the relevant tables and not rely on database concurrency. But Postgres 9.x does support predicate locking. Over the past year or so I have done some work to make the code handle database-condurrency-induced transaction failure. This has been lightly tested. I think it is probably time now to take out the LOCK TABLE instructions (and deal with any resulting fallout). Other notable long-running transactions include: * The sg-check-tested invocation in cr-daily-branch near line 113 ($equivstatusq) which checks whether the baseline revision has been tested. This could probably be speeded up with an appropriate index, but it is not particularly critical because it's readonly and not on a hugely critical path: it runs during flight construction with no locks held other than the table locks (see above - these are removeable) and the per-`branch' lock. [1253] * The `equivalent hosts' query in ts-hosts-allocate-Executive, near line line 303. I think this query could be optimised. This is a fairly significant contributor to overall delays, and it runs in a critical path in the host allocator. It compares unfavourably to the query which extracts information about the very same host ($recentflights_q in Osstest::Executive::duration_estimator), which is responsible for a fraction of the query time despite being having more complex conditions. I think this query may be being misoptimised and some experimentation would be valuable. [880] * Next are the three duration estimator / host allocator queries $recentflights_q, $duration_anyref_q and $duration_duration_q, which are again on the critical path in the host allocator. These are necessarily a bit slow because they do quite a bit of archaeology. Rather than trying to optimise them, it would probably be worth thinking how to get them off the host allocator critical path. [196+187+90] Techniques that come to mind would be memoising the information (and perhaps updating it with a search of recently completed flights), or pre-calculating it so that it runs outside the host allocator job slot (which would, in effect, parallelise the different jobs' executions of instances of this query). Getting into the log tail here: * sg-report-job's query for build jobs (ie, looking for other jobs to recurse to to calculate the revisions used). [61] Not worth doing anything about. * cs-bisection-step's attempt to find a build job to reuse ($reusejob). Maybe some of the indices would help, but this one isn't critical. [44] Ian. From 68e36d6d265e25bae0d4c2c239d11870b856609e Mon Sep 17 00:00:00 2001 From: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> Date: Tue, 30 Aug 2016 10:58:50 +0100 Subject: [OSSTEST PATCH] mg-db-activity-*: New admin debugging tools for DB queries mg-db-activity-logger can be used to sample the activity in the database, for consumption by mg-db-activity-redactor. This can be used to look for long-running db queries and to investigate lock contention. Signed-off-by: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> --- mg-db-activity-logger | 98 +++++++++++++++++++ mg-db-activity-redactor | 245 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 343 insertions(+) create mode 100755 mg-db-activity-logger create mode 100755 mg-db-activity-redactor diff --git a/mg-db-activity-logger b/mg-db-activity-logger new file mode 100755 index 0000000..c4077a7 --- /dev/null +++ b/mg-db-activity-logger @@ -0,0 +1,98 @@ +#!/usr/bin/perl -w + +# Needs to be able to connect as the db superuser. +# +# So probably has to run as user `postgres'. + +use strict; +use DBI; +use Data::Dumper; + +BEGIN { +unshift @INC, "$`" if $0 =~ m#/[^/]+$#; +} + +use Osstest; + +our $dbname = 'osstestdb'; +our $interval = 60; + +while (@ARGV && $ARGV[0] =~ m/^-/) { + $_ = shift @ARGV; + last if m/^--?$/; + while (m/^-./) { + if (s/^--db=(.+)$//) { + $dbname = $1; + } elsif (s/^-t=(\d+)$/-/) { + $interval = $1; + } else { + die "$_ ?"; + } + } +} + +die unless @ARGV == 1; +our ($outfn) = @ARGV; + +our $idbh = DBI->connect("dbi:Pg:dbname=$dbname",'','', + { RaiseError => 1, PrintError => 1 }); + +my @tsfields = qw(backend_start xact_start query_start); + +my $iq_tsfields = join ', ', map { + "extract(epoch from a.$_) as osstest_${_}"; +} @tsfields; + +my $iquery = $idbh->prepare(<<END); + SELECT *, $iq_tsfields + FROM pg_locks l + FULL JOIN pg_stat_activity a + ON l.pid = a.procpid +END + +my $odbh = DBI->connect("dbi:SQLite:dbname=$outfn",'','', + { RaiseError => 1, PrintError => 1 }); + + +my $oquery; + +our $mktable_done; + +sub mktable () { + my $cstmt = "CREATE TABLE IF NOT EXISTS datapoints (\n". + " osstest_timestamp INTEGER NOT NULL"; + my $ostmt = "INSERT INTO datapoints VALUES (?"; + my $fields = $iquery->{NAME}; + foreach my $f (@$fields) { + $cstmt .= sprintf ",\n %-30s TEXT", $f; + $ostmt .= ",?"; + } + $cstmt .= "\n)"; + $ostmt .= ");"; + $odbh->do($cstmt); + $oquery = $odbh->prepare($ostmt); +} + +for (;;) { + $odbh->begin_work; + + my $now = time; + + $iquery->execute(); + my $dp = $iquery->fetchall_arrayref(); + + if (!$mktable_done) { + mktable(); + $mktable_done = 1; + } + + foreach my $row (@$dp) { + $oquery->execute($now, @$row); + } + + $odbh->commit; + + printf "%s: %d rows\n", (show_abs_time $now), (scalar @$dp); + + sleep $interval; +} diff --git a/mg-db-activity-redactor b/mg-db-activity-redactor new file mode 100755 index 0000000..253a976 --- /dev/null +++ b/mg-db-activity-redactor @@ -0,0 +1,245 @@ +#!/usr/bin/perl -w + +use strict; +use DBI; +use Data::Dumper; +use File::Copy; + +BEGIN { +unshift @INC, "$`" if $0 =~ m#/[^/]+$#; +} + +use Osstest; + +our @tsforce; + +while (@ARGV && $ARGV[0] =~ m/^-/) { + $_ = shift @ARGV; + last if m/^--?$/; + while (m/^-./) { + if (s/^-T(\d+)$/-/) { + push @tsforce, $1; + } else { + die "$_ ?"; + } + } +} + +die unless @ARGV == 2; +our ($infn,$outfn) = @ARGV; + +copy($infn,$outfn); + +my $dbh = DBI->connect("dbi:SQLite:dbname=$outfn",'','', + { RaiseError => 1, PrintError => 1 }); + +$dbh->do(<<END); + CREATE INDEX + IF NOT EXISTS datapoints_ts + ON datapoints + ( + osstest_timestamp + ) +END + +$dbh->do(<<END); + ALTER TABLE datapoints + ADD COLUMN osstest_waiters + INTEGER + NOT NULL + DEFAULT 0 +END + +sub massage_query ($) { + local ($_) = @_; + $_ //= '<NULL>'; + s/\$\d+/\$X/g; + s/\(val \= \$X(?: OR val \= \$X)*\)/(val =...)/g; + s/\b(flight\s*(?:[<>]=?|=)\s*('?))\d+\b\2/$1FFFFF$2/g; + s/\b(job\s*=\s*('))[-0-9a-z]+\b\2/$1JJJJJ$2/g; + s/\b(stepno\s*=\s*('))\d+\b\2/$1SSSSS$2/g; + s/\b(finished\s*(?:[<>]=?|=)\s*('?))\d+\b\2/$1FINISHED$2/g; + return $_; +} + +our %total_charge; +# $total_charge{MANNER}{WHAT}{QUERY} = NUM.DATAPOINTS or SECONDS +# MANNER is +# direct cost of each task is accounted to the query it is +# performing +# blocker } cost of each task is accounted to the task which +# mystery } ultimately holds the lock which is blocking it, +# or maybe to the task whose blocker cannot be found +# ("mystery" ) +# +# WHAT is +# +# Tasks cost is 1 per task that exists at all +# Waiters cost is 1 per task that is waiting for a lock +# Delay cost is for each task the time the current query has +# been running +# +# Sampling means that Tasks and Waiters are linear: they are +# proportional to the system behaviour. Delay is quadratic, since a +# long-running query will be billed the duration(s) so far for at each +# time interval. So Delay is in units of time^2. + +sub compute_waiters () { + my $tsq = $dbh->prepare(<<END); + SELECT DISTINCT + osstest_timestamp + FROM datapoints + ORDER BY osstest_timestamp ASC +END + + my $eventsq = $dbh->prepare(<<END); + SELECT rowid, + locktype, + database, + relation, + virtualxid, + waiting, + granted, + procpid, + current_query, + xact_start, + osstest_query_start + FROM datapoints + WHERE osstest_timestamp = ? + ORDER BY procpid, + granted DESC, + locktype, + database, + relation, + virtualxid +END + + if (@tsforce) { + @tsforce = sort { $a <=> $b } @tsforce; + } else { + $tsq->execute(); + } + + our %lastquery; + + while (my ($ts) = (shift @tsforce) // $tsq->fetchrow_array()) { + print "#TS=$ts\n"; + $eventsq->execute($ts); + my %pid2info; + my %pid2waitlock; + my %lockholders; + my @waitingpids; + while (my $evrow = $eventsq->fetchrow_hashref()) { + my $lock = join "#", ($evrow->{locktype} // '', + $evrow->{database} // '', + $evrow->{relation} // '', + $evrow->{virtualxid} // ''); + my $client = join '#', ($evrow->{procpid} // '', + $evrow->{client_addr} // '', + $evrow->{client_hostname} // '', + $evrow->{client_port} // ''); + my $pid = $evrow->{procpid}; + my $cqd = $evrow->{current_query} // '<NONE>'; + $cqd =~ s/\s+/ /g; + $cqd =~ s/^\s+//; + $cqd =~ s/\s+$//; + #print "#TS=$ts pid=".($pid//'none')." LQ=| ". + # ($lastquery{$client} // '<UNDEF>')."\n"; + $lastquery{$client} //= ['<UNKNOWN>']; + my @cqd = @{ $lastquery{$client} }; + push @cqd, $cqd unless @cqd && $cqd[$#cqd] eq $cqd; + if ($cqd =~ m#^\<IDLE\>#) { + # + } elsif ($cqd =~ m#^LOCK TABLE\b#i) { + $lastquery{$client} = \@cqd; + } else { + @cqd = ($cqd[$#cqd]); + $lastquery{$client} = \@cqd; + } + $cqd = join ' | ', @cqd; + #print "#TS=$ts pid=".($pid//'none')." LQ=| $lastquery{$client}\n"; + $evrow->{QUERY} = $cqd; + print "#TS=$ts pid=".($pid//'none')." lock=$lock". + " granted=".!!$evrow->{granted}. + " waiting=".!!$evrow->{waiting}." | $cqd\n"; + $pid2info{$pid} = $evrow if defined $pid; + if ($evrow->{granted}) { + push @{ $lockholders{$lock} }, $pid; + next; + } elsif ($evrow->{waiting}) { + die unless defined $pid; + die if $pid2waitlock{$pid}; + $pid2waitlock{$pid} = $lock; + } + } + foreach my $pid (sort keys %pid2info) { + my $evrow = $pid2info{$pid}; + if (!defined $evrow->{xact_start}) { + print "#%S=$ts pid=$pid no charge\n"; + next; + } + my %tcost = ( + Tasks => 1, + Waiters => !!$evrow->{waiting}, + Delay => (defined $evrow->{osstest_query_start} ? + ($ts - $evrow->{osstest_query_start}) : 0), + ); + + my $query = massage_query $evrow->{QUERY}; + print "#TS=$ts pid=$pid costs=". + (join ' ', map { "$_=$tcost{$_}" } sort keys %tcost). + " QS=$evrow->{xact_start} QUERY=$query\n"; + + my $charge_someone = sub { + my ($chargepid, $factor, $manner) = @_; + print "#%S=$ts pid=$pid charge=$chargepid $factor $manner\n"; + foreach my $tcostk (keys %tcost) { + $total_charge{$manner}{$tcostk}{$query} += + $tcost{$tcostk} * $factor; + } + }; + + my $charge_blocker; + $charge_blocker = sub { + my ($chargepid, $factor) = @_; + my $lock = $pid2waitlock{$chargepid}; + + print "#%S=$ts pid=$pid charge_blocker". + " pid=$chargepid factor=$factor". + " lock=".($lock//'<undef>')."\n"; + if (!defined $lock) { + $charge_someone->($chargepid, $factor, 'blocker'); + return; + } + my $holders = $lockholders{$lock}; + if (!@$holders) { + $charge_someone->($chargepid, $factor, 'mystery'); + return; + } + foreach my $h (@$holders) { + $charge_blocker->($h, $factor * 1.0 / @$holders); + } + }; + + $charge_someone->($pid, 1.0, 'direct'); + $charge_blocker->($pid, 1.0); + } + } +} + +sub printout () { + foreach my $manner (sort keys %total_charge) { + my $imanner = $total_charge{$manner}; + foreach my $what (sort keys %$imanner) { + my $iwhat = $imanner->{$what}; + foreach my $query (sort { $iwhat->{$b} <=> $iwhat->{$a} } + keys %$iwhat) { + printf "%-7s %-7s | %10d %s\n", $manner, $what, + $iwhat->{$query}, $query; + } + } + } +} + +compute_waiters(); +printout(); -- 2.1.4 _______________________________________________ Wg-test-framework mailing list Wg-test-framework@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/cgi-bin/mailman/listinfo/wg-test-framework
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |