Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

svn metric extractor

by bellaire (Hermit)
on Jan 28, 2010 at 14:21 UTC ( #820156=CUFP: print w/replies, xml ) Need Help??

Subversion's command line tools don't give you certain types of summary information, but they can be readily computed by analyzing the information given. This script provides the following information on a single line with fixed-width columns for each revision:

  • Date and Time
  • Username
  • Revision #
  • # Lines Added
  • # Lines Modified*
  • # Lines Churned (Added + Modified)
  • # Lines Deleted
  • # Files Added
  • # Files Updated
  • # Files Deleted
* Lines Modified is calculated by examining adjacent removed and added lines in the unified diff output. Basically, unified diff shows modified lines as a sequence of removed lines immediately followed by a sequence of added lines. Thus, modified can naively be considered to be the lesser of lines added or removed for these contiguous blocks, with the excess counted as actually added or removed. E.g.:
  • 5 removes then 5 adds = 5 modified
  • 10 removes then 3 adds = 7 removed, 3 modified
  • 4 removed and 6 added = 4 modified, 2 added.

The usual benefit of line metrics like these is to predict fault rates. The more lines that are changing, in general, the more bugs you can expect are being introduced. Having the data summarized in this format makes it easy to find spikes in the change rate either by hand or by feeding the data into something else (like a chart program). Also accepts a blacklist file if there are certain revisions which you know can be ignored, for example, accidental imports of large foreign code, not that I've done that or anything...

The script uses svnlook to generate its data, because when running it for the first time it needs to look at the entire revision history for the path in question, and using the regular svn client command was very slow. Yes, I output progress bars for this.

An alternate run mode can be used to simply calculate the data for a single revision and append it to the output file, which would be suitable for a post-commit hook to keep the output file up-to-date.

#!/usr/bin/perl use strict; use warnings; my $BARSIZE = 40; # Size of the progress bar my @statuses = qw(A U D); my $repo = shift(); my $outfile = shift() || "./svnloc.txt"; my $revision = shift(); my $latest_rev; my %rev_users; my %rev_dates; my %rev_changes; my %rev_diff; if (not defined $repo or not -e $repo) { print <<END_USAGE; Usage: svnloc repo [outfile [revision]] repo the path to the svn repository outfile the path for the output file, defaults to "./svnloc. +txt" revision if specified, will append data for that revision to +the output if not specified, all data for all revisions is obta +ined and the file is generated from scratch, overwriting the old file +if it exists. END_USAGE exit(1); } my $bl_filename = "svnloc.blacklist"; my @blacklist; # Don't count these revisions if (-e "svnloc.blacklist") { open BL, $bl_filename; chomp( @blacklist = <BL> ); close BL; } if (defined $revision) { #get info for our revision and append to out +put file get_info($revision); open OUTPUT, ">>$outfile"; output_line($revision); close OUTPUT; } else { # generate output file from scratch my $history = `svnlook history $repo`; ($latest_rev) = $history =~ /(\d+)/s; print "Latest revision: $latest_rev\n"; rev_loop("Obtaining revision information...",\&get_info); open OUTPUT, ">$outfile"; printf OUTPUT ("%-20s%-18s%6s%7s%7s%5s%5s\n","Date","Username" +,"Rev","Add","Mod","Chrn","Del",@statuses); rev_loop("Generating outputfile ($outfile)...",\&output_line); close OUTPUT; } print "Finished.\n"; sub get_info { my $rev = shift; my $info = `svnlook info -r $rev $repo`; my ($user, $date) = split(/\n/,$info); $rev_users{$rev} = $user; $rev_dates{$rev} = $date; my $changed = `svnlook changed -r $rev $repo`; for my $s (split(/\n/,$changed)) { my ($status) = substr($s,0,1); $rev_changes{$rev}->{$status}++; } my $diff = `svnlook diff -r $rev $repo`; my ($added,$modified,$deleted,$temp_deleted) = (0)x4; for my $line (split(/\n/,$diff)) { my $c2 = substr($line,0,2); my ($c) = substr($line,0,1); next if ($c2 eq '--' || $c2 eq '++'); # ignore header line +s if ($c eq '-') { $temp_deleted++; } elsif ($c eq '+') { if ($temp_deleted) { $temp_deleted--; $modified++; } else { $added++; } } else { $deleted += $temp_deleted; $temp_deleted = 0; } } $rev_diff{$rev}->{added} = $added; $rev_diff{$rev}->{modified} = $modified; $rev_diff{$rev}->{churn} = $added + $modified; $rev_diff{$rev}->{removed} = $deleted; } sub output_line { my $rev = shift; no warnings 'uninitialized'; printf OUTPUT ("%20s%-18s%6d%7d%7d%7d%7d%5d%5d%5d\n", substr($rev_dates{$rev},0,20), $rev_users{$rev}, $rev, $rev_diff{$rev}->{added}, $rev_diff{$rev}->{modified}, $rev_diff{$rev}->{churn}, $rev_diff{$rev}->{removed}, map { $rev_changes{$rev}->{$_} } @statuses); } sub rev_loop { my ($msg, $code) = @_; my $progress; print "$msg\n"; start_progress(\$progress); for (1..$latest_rev) { tick_progress(\$progress,$latest_rev); next if (is_in($_,@blacklist)); $code->($_); } end_progress(); } sub start_progress { my $progress = shift(); $$progress=0; print "[" . (" " x $BARSIZE) . "]\r"; } sub tick_progress { my $progress = shift(); my $max = shift(); my $ticks = int(($$progress++/$max) * $BARSIZE); my $spaces = $BARSIZE - $ticks; printf "[" . ("=" x $ticks) . (" " x $spaces) . "] %-10s\r",$_; } sub end_progress { print "[" . ("=" x $BARSIZE) . "]\n\n"; } sub is_in { my $item = shift; my @list = @_; my %seen; @seen{@list} = (1) x scalar @list; return $seen{$item}; }

Replies are listed 'Best First'.
Re: svn metric extractor
by jmcnamara (Monsignor) on Jan 28, 2010 at 15:34 UTC

    That is useful.

    I often generate metrics using 'svn diff' combined with diffstat* which produces a nice ascii bar chart to indicate the amount of change.

    * This is often available on Linux systems but you may need to compile the latest to display modified lines as well as added/deleted. The output looks something like this:

    foo/bar/procs/update.sql | 14 + foo/include/ltime.h | 23 ++ foo/bar/pp_cfg.c | 18 + foo/bar/newdata.c | 189 ++++++++!! foo/bar/newdata.h | 25 + foo/bar/parser.c | 8 - ... 16 files changed, 661 insertions(+), 96 deletions(-), 76 modificatio +ns(!)


      Hi!! Exactly what command you use to generate this output? with Subversion. Thanks for all

      John, it's possible to share the commands that you use to find add, modified and deleted lines between two revisions? Thank's for all

        Using diffstat with the -m option gives the modified lines (as well as the added and deleted):

        diff dir1 dir2 | diffstat -m

        I didn't need to modify it in any way. However, at the time of the original post I had to build the code from the website to get a version with the -m option. I think more recent OSes will come with that version by default or at least provide a package.


        Thank's for all John. It's possible to obtain # Lines Added # Lines Modified # Lines Churned (Added + Modified) # Lines Deleted separatly between two revisions? I need to obtain lines added, lines modified, lines deleted with shell commands between two subversion revision. It's possible?

Re: svn metric extractor
by jwkrahn (Monsignor) on Jan 28, 2010 at 17:13 UTC
    do { chomp; push @blacklist, $_; } for (<BL>);

    Really?    Why not:

    chomp( @blacklist = <BL> );
Re: svn metric extractor
by spx2 (Deacon) on Mar 12, 2010 at 13:39 UTC
    looks pretty useful. in git you have this built-in
      I was wondering if is possible to get the same information but per each file per revision...
      187,11/18/08 12:12,,des_user1,M,7,2,3 187,11/18/08 12:12,,des_user1,M,1,1,1 185,11/18/08 8:59,,des_user1,M,2,1,2
      I am not too "versed" in perl so I was wondering how difficult is a fix to this script in order to get that information...
      This code is working for local repository. But not for the remote repository. please suggest me the steps to connect to remote repository.
        ram, you'll be getting into the C bindings to do that. It's a nasty place. I strongly suggest not doing that and just checking out the code. You have to track SVN memory pools, and a bunch of other stuff you really, really don't care about.

        This is based on the system I used to work with at Yahoo! that used the C bindings. The code, it burned.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://820156]
Approved by Corion
Front-paged by Arunbear
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2019-10-14 17:52 GMT
Find Nodes?
    Voting Booth?