http://www.perlmonks.org?node_id=820156

Subversion's command line tools don't give you certain types of summary information, but they can be readily computed by analyzing the information given. This script provides the following information on a single line with fixed-width columns for each revision:

* Lines Modified is calculated by examining adjacent removed and added lines in the unified diff output. Basically, unified diff shows modified lines as a sequence of removed lines immediately followed by a sequence of added lines. Thus, modified can naively be considered to be the lesser of lines added or removed for these contiguous blocks, with the excess counted as actually added or removed. E.g.:
  • 5 removes then 5 adds = 5 modified
  • 10 removes then 3 adds = 7 removed, 3 modified
  • 4 removed and 6 added = 4 modified, 2 added.

The usual benefit of line metrics like these is to predict fault rates. The more lines that are changing, in general, the more bugs you can expect are being introduced. Having the data summarized in this format makes it easy to find spikes in the change rate either by hand or by feeding the data into something else (like a chart program). Also accepts a blacklist file if there are certain revisions which you know can be ignored, for example, accidental imports of large foreign code, not that I've done that or anything...

The script uses svnlook to generate its data, because when running it for the first time it needs to look at the entire revision history for the path in question, and using the regular svn client command was very slow. Yes, I output progress bars for this.

An alternate run mode can be used to simply calculate the data for a single revision and append it to the output file, which would be suitable for a post-commit hook to keep the output file up-to-date.

#!/usr/bin/perl use strict; use warnings; my $BARSIZE = 40; # Size of the progress bar my @statuses = qw(A U D); my $repo = shift(); my $outfile = shift() || "./svnloc.txt"; my $revision = shift(); my $latest_rev; my %rev_users; my %rev_dates; my %rev_changes; my %rev_diff; if (not defined $repo or not -e $repo) { print <<END_USAGE; Usage: svnloc repo [outfile [revision]] repo the path to the svn repository outfile the path for the output file, defaults to "./svnloc. +txt" revision if specified, will append data for that revision to +the output if not specified, all data for all revisions is obta +ined and the file is generated from scratch, overwriting the old file +if it exists. END_USAGE exit(1); } my $bl_filename = "svnloc.blacklist"; my @blacklist; # Don't count these revisions if (-e "svnloc.blacklist") { open BL, $bl_filename; chomp( @blacklist = <BL> ); close BL; } if (defined $revision) { #get info for our revision and append to out +put file get_info($revision); open OUTPUT, ">>$outfile"; output_line($revision); close OUTPUT; } else { # generate output file from scratch my $history = `svnlook history $repo`; ($latest_rev) = $history =~ /(\d+)/s; print "Latest revision: $latest_rev\n"; rev_loop("Obtaining revision information...",\&get_info); open OUTPUT, ">$outfile"; printf OUTPUT ("%-20s%-18s%6s%7s%7s%5s%5s\n","Date","Username" +,"Rev","Add","Mod","Chrn","Del",@statuses); rev_loop("Generating outputfile ($outfile)...",\&output_line); close OUTPUT; } print "Finished.\n"; sub get_info { my $rev = shift; my $info = `svnlook info -r $rev $repo`; my ($user, $date) = split(/\n/,$info); $rev_users{$rev} = $user; $rev_dates{$rev} = $date; my $changed = `svnlook changed -r $rev $repo`; for my $s (split(/\n/,$changed)) { my ($status) = substr($s,0,1); $rev_changes{$rev}->{$status}++; } my $diff = `svnlook diff -r $rev $repo`; my ($added,$modified,$deleted,$temp_deleted) = (0)x4; for my $line (split(/\n/,$diff)) { my $c2 = substr($line,0,2); my ($c) = substr($line,0,1); next if ($c2 eq '--' || $c2 eq '++'); # ignore header line +s if ($c eq '-') { $temp_deleted++; } elsif ($c eq '+') { if ($temp_deleted) { $temp_deleted--; $modified++; } else { $added++; } } else { $deleted += $temp_deleted; $temp_deleted = 0; } } $rev_diff{$rev}->{added} = $added; $rev_diff{$rev}->{modified} = $modified; $rev_diff{$rev}->{churn} = $added + $modified; $rev_diff{$rev}->{removed} = $deleted; } sub output_line { my $rev = shift; no warnings 'uninitialized'; printf OUTPUT ("%20s%-18s%6d%7d%7d%7d%7d%5d%5d%5d\n", substr($rev_dates{$rev},0,20), $rev_users{$rev}, $rev, $rev_diff{$rev}->{added}, $rev_diff{$rev}->{modified}, $rev_diff{$rev}->{churn}, $rev_diff{$rev}->{removed}, map { $rev_changes{$rev}->{$_} } @statuses); } sub rev_loop { my ($msg, $code) = @_; my $progress; print "$msg\n"; start_progress(\$progress); for (1..$latest_rev) { tick_progress(\$progress,$latest_rev); next if (is_in($_,@blacklist)); $code->($_); } end_progress(); } sub start_progress { my $progress = shift(); $$progress=0; print "[" . (" " x $BARSIZE) . "]\r"; } sub tick_progress { my $progress = shift(); my $max = shift(); my $ticks = int(($$progress++/$max) * $BARSIZE); my $spaces = $BARSIZE - $ticks; printf "[" . ("=" x $ticks) . (" " x $spaces) . "] %-10s\r",$_; } sub end_progress { print "[" . ("=" x $BARSIZE) . "]\n\n"; } sub is_in { my $item = shift; my @list = @_; my %seen; @seen{@list} = (1) x scalar @list; return $seen{$item}; }