Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^2: simple table join script

by sauoq (Abbot)
on Jun 01, 2012 at 11:56 UTC ( #973742=note: print w/ replies, xml ) Need Help??


in reply to Re: simple table join script
in thread simple table join script

That doesn't work correctly.

  • It changes the order of the rows.
  • It drops a value from the first row.
  • (Minor) It splits on spaces, not tabs.

You are jumping through a lot of contortions, seemingly to avoid explicitly opening files. That results in a lot of inefficiency; all that sorting could get expensive if the data sets are big. Using hashes for data that are already ordered and whose order you want to retain is a bad design decision.

And even with your comments, it's not particularly easy to read (for me; and I'm not new to Perl.)

The following might not win any style points, but it is straight forward and relatively efficient. And working.

my $header; # Handle the first file separately so we can keep the "categories." my $file = shift; $header = "CATS \t$file"; # CATS is category column header. open my $fh, '<', $file or die "$file: $!"; my @lines = <$fh>; close $fh; chomp(@lines); for my $file (@ARGV) { # Add filename to header. $header .= "\t$file"; open my $fh, '<', $file or die "$file: $!"; # Iterate through file my $i = 0; while (my $line = <$fh>) { chomp $line; # Append tab and 2nd column to appropriate line. $lines[$i++] .= "\t" . (split /\t/, $line)[1]; } close $fh; } # print our header and each of our new lines. print $header, "\n"; print "$_\n" for @lines;

-sauoq
"My two cents aren't worth a dime.";


Comment on Re^2: simple table join script
Download Code
Re^3: simple table join script
by NetWallah (Abbot) on Jun 01, 2012 at 14:15 UTC
    Addressing your points:
    • It changes the order of the rows. : Yes - it SORTS them. No change if they are already sorted.
    • It drops a value from the first row.: Agreed - I found and fixed this bug this AM (original post was late at night)
    • (Minor) It splits on spaces, not tabs.: Correction - it splits on WHITESPACE, that includes tab.

    I agree - it is not the easiest code to read, but I think it is more data-tolerant than yours, which complains under "use warnings", if the data contains blank lines (which are present in the OP).

    Regarding efficiency - sorting a million pre-sorted records takes less than a second, on modern computers, so I don't see an issue.

    I do appreciate your critique, and enjoy the discussion, but, at times, i have a low threshold for responding to nits, so I apologize in advance if I appear to be un-responsive. to subsequent posts.

                 I hope life isn't a big joke, because I don't get it.
                       -SNL

      Yes - it SORTS them.

      Where did he say changing the order in any manner was what he wanted? The issue is that your code essentially requires reordering in some manner because of your poor choice to use hashes which will destroy the original order. You left yourself with the choice to either sort or go with whatever reordering perl gives you.

      That's not a nit. Not being able to retain the original order is a fundamental issue. Perhaps he doesn't need to. He didn't say. But if he does need to, your code is worthless.

      The efficiency issue may or may not be of practical importance. But its only an issue at all because of the same aforementioned poor design decision. The straight forward approach I showed isn't better because it's more efficient... it's more efficient because it's better.

      Correction - it splits on WHITESPACE, that includes tab.

      That's right. And thanks for picking a nit with my point... I like precision. It's still a potential problem because he didn't specify whether his fields could have other whitespace... and if they did, your code would break. That's a little less data tolerant, arguably, than some warnings on blank lines.† I called it "minor" because it is so easily corrected, unlike your code's other issues.

      Note that adding sorting to the straight forward approach is easily done with a single sort if desired.

      Note also that skipping blank lines is easily added to the straight forward approach with an appropriately placed next unless /\S/; or similar.

      † Warnings on blank lines may actually be desirable. It's often a good think to throw a warning on data anomalies.

      -sauoq
      "My two cents aren't worth a dime.";

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://973742]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (12)
As of 2014-08-22 18:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (163 votes), past polls