Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Calculating percentage of change between columns

by naChoZ (Curate)
on May 29, 2013 at 22:47 UTC ( #1035927=perlquestion: print w/ replies, xml ) Need Help??
naChoZ has asked for the wisdom of the Perl Monks concerning the following question:

I'm wondering if there's a simple way I could shoehorn this feature in without a full rewrite. Basically I want to just throw in a quick calculation of changes between columns.

Here's a trimmed down example of what I'm doing:

Read a file of num to name mappings:

num|name 1|foo 12|bar 15|bar 18|baz 25|quux 27|quux 37|quuux 48|quuuux

Read any number of files (granted, it's rarely more than a handful) that look like this:

file1: acct|description|nums foo-001|foo one|1,12,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25,37 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,25,27,37,99 file2: acct|description|nums foo-001|foo one|1,12,15,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,15,25,27,37,99 file3: acct|description|nums foo-001|foo one|1,12,15,18,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,15,25,27,37,99

I iterate and increment values in a hashref as I go so it ends up looking like:

$hash => { file1 => { foo => 4, bar => 6, baz => 4, quux => 8, quuux => 6, quuuux => 0, }, file2 => { foo => 4, bar => 8, baz => 4, quux => 8, quuux => 5, quuuux => 0, }, file3 => { foo => 4, bar => 8, baz => 5, quux => 8, quuux => 5, quuuux => 0, }, }

And then simply I produce a report like this to make it easy to spot the differences:

name file1 file2 file3 foo 4 4 4 bar 6 8 8 baz 4 4 5 quux 8 8 8 quuux 6 5 5 quuuux empty empty empty

What I'd like to do is something like this:

name file1 file2 %change file3 %change foo 4 4 0% 4 0% bar 6 8 133% 8 0% baz 4 4 0% 5 125% quux 8 8 0% 8 0% quuux 6 5 -83% 5 0% quuuux empty empty empty

My code is very straightforward, the only difference being I'm handling a file with 20+ columns and the each file is 300,000+ lines.

I'm populating %$hash exactly as you might expect, opening each file, iterating, if $hash->{$file}->{$name} isn't defined I define it, otherwise I increment $hash->{$file}->{$name}++. (The other subtle differences are intentional, like how 99 appears on some of the rows, but since it doesn't appear in the mapping of nums to names, I don't include it in the report.)

It doesn't seem like trying to calculate differences inside the same loop I'm using to iterate the files is the way to go.

I only see a couple of possible paths, but I'm can't wrap my head around either of them very well. Should I iterate the resulting %$hash after I finish creating it and make a new hash out of the results? Or, while I populate %$hash, should I also somehow populate an additional hash to make it easy to calculate later?

Any advice appreciated.

Just so to include the obligatory code sample it goes something like this:

my $hash = {}; my @heading = ( 'name' ); my @report; for my $fn ( sort @filenames ) { open my $fh, '<', $fn or die "Error opening file ${fn}: $!\n"; # Read in each filename and populate the hash # while (<$fh>) { chomp; s%\r%%; my @line = split /\|/; my @curnums = split( ',', $line[2] ); for my $curnum ( @curnums ) { next unless $defined $nums_to_names->{$curnum}; if ( ! defined $hash->{$fn} or ! defined $hash->{$fn}->{$c +urnum} ) { $hash->{$fn}->{$curnum} = 1; } else { $hash->{$fn}->{$curnum}++; } } } # Iterate the mapping of numbers to names # for my $curnum ( sort keys %$nums_to_names ) { my @report_line; # Skip it unless it's defined and has a value # my $name = defined $nums_to_names->{$curnum} && $nums_to_names->{$curnum} ? $nums_to_names->{$curnum} : next ; push @report_line, $name; # For the current mapping number, pluck the corresponding coun +ts # related to each file # for my $curfilename ( sort keys %$hash ) { my $count = defined $hash->{$curfilename}->{$curnum} && $hash->{$curfilename}->{$curnum} ? commify($hash->{$curfilename}->{$curnum}) : 'empty' ; push @report_line, $count; } push @report, \@report_line; } } push @heading, basename($_) for sort @filenames;
Then I iterate @heading and @report and print them out cell by cell.

Any tips on how I might add the percentage change between columns?

--
Andy

Comment on Calculating percentage of change between columns
Select or Download Code
Re: Calculating percentage of change between columns
by choroba (Abbot) on May 29, 2013 at 23:39 UTC
    Note that your sample output is wrong. If going from 4 to 4 changes 0%, than going from 6 to 8 should change 33%, not 133%. Also, you cannot express the change going anywhere from 0.

    You might find the following code useful. I have modified some values to show some other interesting results.

    #!/usr/bin/perl use warnings; use strict; my %hash = ( file1 => { foo => 4, bar => 6, baz => 4, quux => 8, quuux => 6, quuuux => 0, }, file2 => { foo => 4, bar => 8, baz => 4, quux => 8, quuux => 5, quuuux => 0, }, file3 => { foo => 8, bar => 0, baz => 5, quux => 8, quuux => 5, quuuux => 3, }, ); my %columns = map { $_ => 1 } map keys %{ $hash{$_} }, keys %hash; my @files = qw/file1 file2 file3/; for my $column (keys %columns) { print $column, "\t", $hash{$files[0]}{$column}; for my $i (1 .. $#files) { my ($this, $previous) = map $hash{ $files[$_] }{$column}, $i, +$i - 1; print "\t", $this; my $change = '-'; if ($previous) { $change = sprintf "%d%%", 100 * ($this - $previous) / $pre +vious; } print "\t$change"; } print "\n"; }

    Output:

    bar 6 8 33% 0 -100% baz 4 4 0% 5 25% quuux 6 5 -16% 5 0% quux 8 8 0% 8 0% foo 4 4 0% 8 100% quuuux 0 0 - 3 -
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Perfect, thanks for that. The iterating of the array index and then using it in the map like you did with $this and $previous instead of the foreach style I was using was just what I needed.

      --
      Andy

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1035927]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2014-08-02 08:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (55 votes), past polls