Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Calculating percentage of change between columns

by naChoZ (Curate)
on May 29, 2013 at 22:47 UTC ( #1035927=perlquestion: print w/ replies, xml ) Need Help??
naChoZ has asked for the wisdom of the Perl Monks concerning the following question:

I'm wondering if there's a simple way I could shoehorn this feature in without a full rewrite. Basically I want to just throw in a quick calculation of changes between columns.

Here's a trimmed down example of what I'm doing:

Read a file of num to name mappings:

num|name 1|foo 12|bar 15|bar 18|baz 25|quux 27|quux 37|quuux 48|quuuux

Read any number of files (granted, it's rarely more than a handful) that look like this:

file1: acct|description|nums foo-001|foo one|1,12,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25,37 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,25,27,37,99 file2: acct|description|nums foo-001|foo one|1,12,15,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,15,25,27,37,99 file3: acct|description|nums foo-001|foo one|1,12,15,18,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,15,25,27,37,99

I iterate and increment values in a hashref as I go so it ends up looking like:

$hash => { file1 => { foo => 4, bar => 6, baz => 4, quux => 8, quuux => 6, quuuux => 0, }, file2 => { foo => 4, bar => 8, baz => 4, quux => 8, quuux => 5, quuuux => 0, }, file3 => { foo => 4, bar => 8, baz => 5, quux => 8, quuux => 5, quuuux => 0, }, }

And then simply I produce a report like this to make it easy to spot the differences:

name file1 file2 file3 foo 4 4 4 bar 6 8 8 baz 4 4 5 quux 8 8 8 quuux 6 5 5 quuuux empty empty empty

What I'd like to do is something like this:

name file1 file2 %change file3 %change foo 4 4 0% 4 0% bar 6 8 133% 8 0% baz 4 4 0% 5 125% quux 8 8 0% 8 0% quuux 6 5 -83% 5 0% quuuux empty empty empty

My code is very straightforward, the only difference being I'm handling a file with 20+ columns and the each file is 300,000+ lines.

I'm populating %$hash exactly as you might expect, opening each file, iterating, if $hash->{$file}->{$name} isn't defined I define it, otherwise I increment $hash->{$file}->{$name}++. (The other subtle differences are intentional, like how 99 appears on some of the rows, but since it doesn't appear in the mapping of nums to names, I don't include it in the report.)

It doesn't seem like trying to calculate differences inside the same loop I'm using to iterate the files is the way to go.

I only see a couple of possible paths, but I'm can't wrap my head around either of them very well. Should I iterate the resulting %$hash after I finish creating it and make a new hash out of the results? Or, while I populate %$hash, should I also somehow populate an additional hash to make it easy to calculate later?

Any advice appreciated.

Just so to include the obligatory code sample it goes something like this:

my $hash = {}; my @heading = ( 'name' ); my @report; for my $fn ( sort @filenames ) { open my $fh, '<', $fn or die "Error opening file ${fn}: $!\n"; # Read in each filename and populate the hash # while (<$fh>) { chomp; s%\r%%; my @line = split /\|/; my @curnums = split( ',', $line[2] ); for my $curnum ( @curnums ) { next unless $defined $nums_to_names->{$curnum}; if ( ! defined $hash->{$fn} or ! defined $hash->{$fn}->{$c +urnum} ) { $hash->{$fn}->{$curnum} = 1; } else { $hash->{$fn}->{$curnum}++; } } } # Iterate the mapping of numbers to names # for my $curnum ( sort keys %$nums_to_names ) { my @report_line; # Skip it unless it's defined and has a value # my $name = defined $nums_to_names->{$curnum} && $nums_to_names->{$curnum} ? $nums_to_names->{$curnum} : next ; push @report_line, $name; # For the current mapping number, pluck the corresponding coun +ts # related to each file # for my $curfilename ( sort keys %$hash ) { my $count = defined $hash->{$curfilename}->{$curnum} && $hash->{$curfilename}->{$curnum} ? commify($hash->{$curfilename}->{$curnum}) : 'empty' ; push @report_line, $count; } push @report, \@report_line; } } push @heading, basename($_) for sort @filenames;
Then I iterate @heading and @report and print them out cell by cell.

Any tips on how I might add the percentage change between columns?

--
Andy

Comment on Calculating percentage of change between columns
Select or Download Code
Re: Calculating percentage of change between columns
by choroba (Abbot) on May 29, 2013 at 23:39 UTC
    Note that your sample output is wrong. If going from 4 to 4 changes 0%, than going from 6 to 8 should change 33%, not 133%. Also, you cannot express the change going anywhere from 0.

    You might find the following code useful. I have modified some values to show some other interesting results.

    #!/usr/bin/perl use warnings; use strict; my %hash = ( file1 => { foo => 4, bar => 6, baz => 4, quux => 8, quuux => 6, quuuux => 0, }, file2 => { foo => 4, bar => 8, baz => 4, quux => 8, quuux => 5, quuuux => 0, }, file3 => { foo => 8, bar => 0, baz => 5, quux => 8, quuux => 5, quuuux => 3, }, ); my %columns = map { $_ => 1 } map keys %{ $hash{$_} }, keys %hash; my @files = qw/file1 file2 file3/; for my $column (keys %columns) { print $column, "\t", $hash{$files[0]}{$column}; for my $i (1 .. $#files) { my ($this, $previous) = map $hash{ $files[$_] }{$column}, $i, +$i - 1; print "\t", $this; my $change = '-'; if ($previous) { $change = sprintf "%d%%", 100 * ($this - $previous) / $pre +vious; } print "\t$change"; } print "\n"; }

    Output:

    bar 6 8 33% 0 -100% baz 4 4 0% 5 25% quuux 6 5 -16% 5 0% quux 8 8 0% 8 0% foo 4 4 0% 8 100% quuuux 0 0 - 3 -
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Perfect, thanks for that. The iterating of the array index and then using it in the map like you did with $this and $previous instead of the foreach style I was using was just what I needed.

      --
      Andy

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1035927]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (13)
As of 2014-10-23 16:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (125 votes), past polls