Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^3: Split tab-separated file into separate files, based on column name (open on demand)

by jcb (Parson)
on Aug 27, 2020 at 03:57 UTC ( [id://11121118]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Split tab-separated file into separate files, based on column name (open on demand)
in thread Split tab-separated file into separate files, based on column name

Point is Perl has no mean to print_and_open_if_ne­cessary()

Sometimes Perl is not the best tool for the job. Awk does have that feature and here is an Awk program that does what our questioner asks:

#!/usr/bin/awk -f BEGIN { FS = "\t" } FNR == 1 { split("", Fields) # clear fields array for (i = 1; i <= NF; i++) Fields[i] = $i next } { for (i = 1; i <= NF; i++) print $i > Fields[i] }

Save it in a file and mark it executable; tested with GNU Awk. Feed it input on stdin or list the files you want it to read on the command line.

If you want to add prefixes or suffixes to the output file names, add them to the print statement, like so: print $i > ("out."Fields[i]".txt"); the parentheses ensure that the invisible concatenation operator will be parsed correctly.

Replies are listed 'Best First'.
Re^4: Split tab-separated file into separate files, based on column name (open on demand)
by haukex (Archbishop) on Aug 27, 2020 at 19:19 UTC

    Since this is currently the top node of the past 24 hours, I'll comment.

    Sometimes Perl is not the best tool for the job. Awk ...

    I strongly disagree. Perl is a replacement for awk and sed and can do everything they can, and much, much more. tobyink pointed out IO::All - and while this module may not be in the core, note that CPAN is one of Perl's greatest strengths.

    If you're familiar enough with awk to whip up this script that's fine, and it's certainly interesting to see how it's done in other languages (though this isn't AwkMonks), but consider that the OP may already not be very familiar with Perl, and throwing yet another new language into the mix is unlikely to be the most efficient approach in the long run.

      Perl is a replacement for awk and sed and can do everything they can, and much, much more.

      Yes, but sometimes the older tools are better fits for the problem at hand. Some time ago I suggested to another questioner to either use sed in his shell script or rewrite the entire script in Perl because sed could do the work in less time than Perl needs for startup/shutdown overhead. Perl is more flexible and powerful, but that power does come at a cost and this question happens to fit Awk's domain almost exactly.

      Awk's greatest strength and greatest limitation is the implicit outer loop. On one hand, that feature allows Awk programs to be very efficient, but on the other hand, it limits Awk to processing input text streams.

      (though this isn't AwkMonks)

      I firmly believe that every Perl programmer should learn Awk because learning Awk will make you a better Perl programmer.

        ... do the work in less time than Perl needs for startup/shutdown overhead. Perl is more flexible and powerful, but that power does come at a cost ...

        If that really was the point you were trying to make here, then it probably would have been better if you'd benchmarked and shown a solution that's actually faster than Perl. On a longer input file (OP never specified file length, but the fact that the number of columns grew from 3 to 20 is a hint), this pure Perl solution I whipped up is twice as fast as the awk code you showed:

        use warnings; use strict; my @cols = split /\t/, <>; chomp($cols[-1]); my @fh = map { open my $fh, '>', $_ or die $!; $fh } @cols; while ( my $line = <> ) { chomp($line); my @row = split /\t/, $line; print {$fh[$_]} $row[$_], "\n" for 0..$#row; }
        I firmly believe that every Perl programmer should learn Awk because learning Awk will make you a better Perl programmer.

        Sure, in general, the more programming languages a programmer is exposed to, the better they (usually) become. And yet, there are other situations:

        Some time ago I suggested to another questioner to either use sed in his shell script ...

        And I once showed someone who was writing an installer shell script how to use a oneliner to do a search and replace to change a configuration variable. And what happened? As the installation script grew, the oneliner just got called over and over again for different variables. While you, I, and the OP may know there are better solutions (as you said yourself, "rewrite the entire script in Perl"), these posts are public and may be read by people who may not know better, and in particular in comparison to awk, I disagree with an unqualified "Sometimes Perl is not the best tool for the job."

        Update - I also wanted to mention: In environments where there are several programmers on a team, most of whom are only focused on one language, having a product consist of code written in several different languages is more likely to cause maintenance problems. These are the reasons I said "throwing yet another new language into the mix" isn't necessarily a good thing. (Also, just in case there's any confusion with non-native speakers, the definition of "unqualified" I was using is "not modified or restricted by reservations", as in an "unqualified statement", and not "not having requisite qualifications", as in an "unqualified person".)

        > Awk's greatest strength and greatest limitation is the implicit outer loop

        Are you aware about Perl's command switches?

        If not, just have a look at perlrun and search for "awk".

        > Perl needs for startup/shutdown overhead.

        Probably, but do I want to install awk and sed on Windows?

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

      I think his point was that Awk has an open_on_demand.

      And if you know Perl, well it's not very difficult to decipher this Awk script ...

      ( ... oh that's were Larry got these "ideas" from ;-)

      My concern is that it's neither easier nor shorter than Perl.

      For comparison here a script version of my one-liner - already w/o taking advantages of command-line switches.

      $\="\n"; while (<DATA>) { @F = split; unless (@FH) { open $FH[@FH], ">", "$_.txt" for @F; } else { print $_ shift @F for @FH; } } __DATA__ id name position 1 Nick boss 2 George CEO 3 Christina CTO

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        I think his point was that Awk has an open_on_demand.

        That's not what I quoted and was replying to, though. It's certainly an interesting feature of awk's to note, but going from there to saying that awk is better than Perl for the job is too much of a strech, IMHO. And, of course I was disappointed to see the "Best Nodes of The Day" list that node first.

Re^4: Split tab-separated file into separate files, based on column name (open on demand)
by Anonymous Monk on Aug 27, 2020 at 10:01 UTC
    Hi jcb,

    excellent post, thank you! I did write a Perl script after all, but I suspect that your way is much faster!
    Thanks to all that offered their advice, much appreciated :)
Re^4: Split tab-separated file into separate files, based on column name (open on demand)
by LanX (Saint) on Aug 27, 2020 at 10:16 UTC
    > Sometimes Perl is not the best tool for the job

    Well the OP asked for a one liner but you provided now a script.

    I have trouble to see why a Perl script may be worse than an Awk script. (?)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11121118]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-03-28 07:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found