Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

I'm wondering if there's a simple way I could shoehorn this feature in without a full rewrite. Basically I want to just throw in a quick calculation of changes between columns.

Here's a trimmed down example of what I'm doing:

Read a file of num to name mappings:

num|name 1|foo 12|bar 15|bar 18|baz 25|quux 27|quux 37|quuux 48|quuuux

Read any number of files (granted, it's rarely more than a handful) that look like this:

file1: acct|description|nums foo-001|foo one|1,12,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25,37 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,25,27,37,99 file2: acct|description|nums foo-001|foo one|1,12,15,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,15,25,27,37,99 file3: acct|description|nums foo-001|foo one|1,12,15,18,27,37 foo-002|foo two|1,15,25,37 foo-003|foo three|1,18,25 foo-004|foo four|12,18,25,37 foo-005|foo five|12,15,25,27,37 foo-006|foo six|1,12,15,25,27,37,99

I iterate and increment values in a hashref as I go so it ends up looking like:

$hash => { file1 => { foo => 4, bar => 6, baz => 4, quux => 8, quuux => 6, quuuux => 0, }, file2 => { foo => 4, bar => 8, baz => 4, quux => 8, quuux => 5, quuuux => 0, }, file3 => { foo => 4, bar => 8, baz => 5, quux => 8, quuux => 5, quuuux => 0, }, }

And then simply I produce a report like this to make it easy to spot the differences:

name file1 file2 file3 foo 4 4 4 bar 6 8 8 baz 4 4 5 quux 8 8 8 quuux 6 5 5 quuuux empty empty empty

What I'd like to do is something like this:

name file1 file2 %change file3 %change foo 4 4 0% 4 0% bar 6 8 133% 8 0% baz 4 4 0% 5 125% quux 8 8 0% 8 0% quuux 6 5 -83% 5 0% quuuux empty empty empty

My code is very straightforward, the only difference being I'm handling a file with 20+ columns and the each file is 300,000+ lines.

I'm populating %$hash exactly as you might expect, opening each file, iterating, if $hash->{$file}->{$name} isn't defined I define it, otherwise I increment $hash->{$file}->{$name}++. (The other subtle differences are intentional, like how 99 appears on some of the rows, but since it doesn't appear in the mapping of nums to names, I don't include it in the report.)

It doesn't seem like trying to calculate differences inside the same loop I'm using to iterate the files is the way to go.

I only see a couple of possible paths, but I'm can't wrap my head around either of them very well. Should I iterate the resulting %$hash after I finish creating it and make a new hash out of the results? Or, while I populate %$hash, should I also somehow populate an additional hash to make it easy to calculate later?

Any advice appreciated.

Just so to include the obligatory code sample it goes something like this:

my $hash = {}; my @heading = ( 'name' ); my @report; for my $fn ( sort @filenames ) { open my $fh, '<', $fn or die "Error opening file ${fn}: $!\n"; # Read in each filename and populate the hash # while (<$fh>) { chomp; s%\r%%; my @line = split /\|/; my @curnums = split( ',', $line[2] ); for my $curnum ( @curnums ) { next unless $defined $nums_to_names->{$curnum}; if ( ! defined $hash->{$fn} or ! defined $hash->{$fn}->{$c +urnum} ) { $hash->{$fn}->{$curnum} = 1; } else { $hash->{$fn}->{$curnum}++; } } } # Iterate the mapping of numbers to names # for my $curnum ( sort keys %$nums_to_names ) { my @report_line; # Skip it unless it's defined and has a value # my $name = defined $nums_to_names->{$curnum} && $nums_to_names->{$curnum} ? $nums_to_names->{$curnum} : next ; push @report_line, $name; # For the current mapping number, pluck the corresponding coun +ts # related to each file # for my $curfilename ( sort keys %$hash ) { my $count = defined $hash->{$curfilename}->{$curnum} && $hash->{$curfilename}->{$curnum} ? commify($hash->{$curfilename}->{$curnum}) : 'empty' ; push @report_line, $count; } push @report, \@report_line; } } push @heading, basename($_) for sort @filenames;
Then I iterate @heading and @report and print them out cell by cell.

Any tips on how I might add the percentage change between columns?


In reply to Calculating percentage of change between columns by naChoZ

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others surveying the Monastery: (3)
    As of 2018-05-20 18:24 GMT
    Find Nodes?
      Voting Booth?