Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Split tab-separated file into separate files, based on column name

by Anonymous Monk
on Aug 26, 2020 at 11:28 UTC ( [id://11121090]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I am reverting to your expertise for the following task, which most likely already has a one-liner solution that I can't really create.
I have a tab-separated file, like this:
id name position 1 Nick boss 2 George CEO 3 Christina CTO
and I want to split it in as many files as there are columns, and put the respective data in each of them. So far I am going about the stupid way, i.e. grepping the header, then create 3 individual files and then issue the  `cut -f1` etc command for each column. But I now have files with like 20 columns, so there must be a more clever way :)

Replies are listed 'Best First'.
Re: Split tab-separated file into separate files, based on column name
by Tux (Canon) on Aug 26, 2020 at 12:33 UTC

    OK, I'll bite. A one-liner it is:

    $ cat test.tsv id name position 1 Nick boss 2 George CEO 3 Christina CTO $ perl -MText::CSV_XS=csv -E'my$aoh=csv(in=>"test.tsv",bom=>1,sep=>"\t +");' \ -E'for$h(keys%{$aoh->[0]}){say$h;open$fh,">","$h.txt";say$fh $_ + for$h,map{$_->{$h}}@$aoh}' id position name $ cat id.txt id 1 2 3 $ cat name.txt name Nick George Christina $ position boss CEO CTO

    update: added a -E to split the line for readability


    Enjoy, Have FUN! H.Merijn
Re: Split tab-separated file into separate files, based on column name
by Eily (Monsignor) on Aug 26, 2020 at 12:13 UTC

    If you want the very bare functionnality a oneliner might work, but you'll need to switch to a longer script for pretty much any kind of control you may want to have over the result:

    perl -lanE 'for (0..$#F) { `echo $F[$_] >> file$_` }'
    You can read perlrun to understand what the options do (and change the way the file is split, because it's split on whitespace by default, not tabs). It will fail if the input is not simple enough (if there are quotes, dashes, or semi colons in the data). And you'll start to get extra output data if you call it several times in a row.

    All that being said, you asked for the clever way. The clever way is to keep the solution that you understand, if you ever have to fix it.

    Edit s/perlun/perlrun/. Thanks AnomalousMonk

      Kudos.

      I didn't think it's possible and the trick is to shell out the writing and opening to a shorter syntax.

      This might be considered dirty in a real Perl script but should be acceptable in a one-liner. And interestingly it should also work on windows.

      Point is Perl has no mean to print_and_open_if_necessary()

      So the next step is to ask myself if the semantics could be cleanly replicated in Perl...

      IMHO a tied hash %FH would be most elegant

      print $FH{">>$name"} $value

      I didn't try to search CPAN for similar solutions yet, cause I'm not sure how.

      Comments welcome. ..

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        Point is Perl has no mean to print_and_open_if_ne­cessary()

        Sometimes Perl is not the best tool for the job. Awk does have that feature and here is an Awk program that does what our questioner asks:

        #!/usr/bin/awk -f BEGIN { FS = "\t" } FNR == 1 { split("", Fields) # clear fields array for (i = 1; i <= NF; i++) Fields[i] = $i next } { for (i = 1; i <= NF; i++) print $i > Fields[i] }

        Save it in a file and mark it executable; tested with GNU Awk. Feed it input on stdin or list the files you want it to read on the command line.

        If you want to add prefixes or suffixes to the output file names, add them to the print statement, like so: print $i > ("out."Fields[i]".txt"); the parentheses ensure that the invisible concatenation operator will be parsed correctly.

        This might be considered dirty in a real Perl script but should be acceptable in a one-liner.
        100% agree with that sentence (which says a lot, since the sentence is "this might be").

        You could use operator overloading to replicate that feature. "Value" > file("path"); or "Value" >> file("path") where file returns an object that overloads > and >>

        Or you could do something closer to C++:

        fstream("path") << 120 << " in hexadecimal is " << ctrl::hex << 120; fstream("logs", "a") << ctrl::autoline << "I'm adding this line to the + logs" << "and also this line";

Re: Split tab-separated file into separate files, based on column name
by tybalt89 (Monsignor) on Aug 26, 2020 at 13:29 UTC
    #!/usr/bin/perl use strict; #https://perlmonks.org/?node_id=11121090 use warnings; my @handles = map { open my $fh, '>', "tmp.$_" or die; $fh } split /\t|\n/, <DATA>; while( <DATA> ) { my @data = split /\t|\n/; print { $handles[$_] } $data[$_], "\n" for 0 .. $#handles; } close $_ or die for @handles; __DATA__ id name position 1 Nick boss 2 George CEO 3 Christina CTO
Re: Split tab-separated file into separate files, based on column name
by LanX (Saint) on Aug 26, 2020 at 11:36 UTC
    > so there must be a more clever way :)

    You want a one liner and I doubt it'll be very readable.

    The clever way is to split the head line and to open files for each entry and to hold the filehandles in an array.

    Now you can print each field by column position after splitting the remaining lines.

    That's a dozen code lines at most. ..

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re: Split tab-separated file into separate files, based on column name
by Corion (Patriarch) on Aug 26, 2020 at 11:52 UTC
      Will "part" split the file vertically? Because, in my example, the desired output would be:
      * FILE "id" with values 1 2 3 * File "name" with values Nick George Christina * File "position" with values boss CTO CEO

        Oh - sorry, no - this is for splitting a file horizontally according to a column value, not vertically.

Re: Split tab-separated file into separate files, based on column name
by LanX (Saint) on Aug 27, 2020 at 16:19 UTC
    Here a pure Perl one-liner,

    please note that

    • the files are named after the column heads
    • that I use Windows quoting rules
    D:\tmp>del id,name,position D:\tmp>perl -lanE "if (@FH) {print $_ shift @F for @FH} else {open $FH +[$x++], '>', $_ for @F}" data.txt D:\tmp>type data.txt, id,name,position data.txt id name position 1 Nick boss 2 George CEO 3 Christina CTO id 1 2 3 name Nick George Christina position boss CEO CTO

    UPDATE

    eliminated bug

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      ACHTUNG

      The following code is buggy, sorry.

      ... it will create empty files for each field ...

      D:\tmp>type 1,George,CEO 1 George CEO D:\tmp>

      strange behavior... (Update: see solution here )


      I didn't expect this, but Perl seems to silently refuse to re-open an already open file handle

      so if you don't mind having the column head included you can go even shorter

      D:\tmp>del id,name,position D:\tmp>perl -lanE "open $FH[$x++], '>', $_ for @F;print $_ shift @F f +or @FH" data.txt D:\tmp>type id,name,position id id 1 2 3 name name Nick George Christina position position boss CEO CTO D:\tmp>

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        Try changing '>' to '>>'. If I remember correctly, open will silently reopen an already open handle. Since you are using truncating write mode, each file gets truncated every time it is opened.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11121090]
Approved by Corion
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-25 17:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found