Split tab-separated file into separate files, based on column name

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Split tab-separated file into separate files, based on column name by Tux (Canon) on Aug 26, 2020 at 12:33 UTC
OK, I'll bite. A one-liner it is: `$ cat test.tsv id name position 1 Nick boss 2 George CEO 3 Christina CTO $ perl -MText::CSV_XS=csv -E'my$aoh=csv(in=>"test.tsv",bom=>1,sep=>"\t +");' \ -E'for$h(keys%{$aoh->[0]}){say$h;open$fh,">","$h.txt";say$fh $_ + for$h,map{$_->{$h}}@$aoh}' id position name $ cat id.txt id 1 2 3 $ cat name.txt name Nick George Christina $ position boss CEO CTO` [download] update: added a -E to split the line for readability Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re: Split tab-separated file into separate files, based on column name by Eily (Monsignor) on Aug 26, 2020 at 12:13 UTC
If you want the very bare functionnality a oneliner might work, but you'll need to switch to a longer script for pretty much any kind of control you may want to have over the result: perl -lanE 'for (0..$#F) { `echo $F[$_] >> file$_` }' [download] You can read perlrun to understand what the options do (and change the way the file is split, because it's split on whitespace by default, not tabs). It will fail if the input is not simple enough (if there are quotes, dashes, or semi colons in the data). And you'll start to get extra output data if you call it several times in a row. All that being said, you asked for the clever way. The clever way is to keep the solution that you understand, if you ever have to fix it. Edit s/perlun/perlrun/. Thanks AnomalousMonk	[reply] [d/l]
Re^2: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 26, 2020 at 14:50 UTC
Kudos. I didn't think it's possible and the trick is to shell out the writing and opening to a shorter syntax. This might be considered dirty in a real Perl script but should be acceptable in a one-liner. And interestingly it should also work on windows. Point is Perl has no mean to `print_and_open_if_necessary()` So the next step is to ask myself if the semantics could be cleanly replicated in Perl... IMHO a tied hash `%FH` would be most elegant `print $FH{">>$name"} $value` I didn't try to search CPAN for similar solutions yet, cause I'm not sure how. Comments welcome. .. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^3: Split tab-separated file into separate files, based on column name (open on demand) by jcb (Parson) on Aug 27, 2020 at 03:57 UTC
Point is Perl has no mean to `print_and_open_if_necessary()` Sometimes Perl is not the best tool for the job. Awk does have that feature and here is an Awk program that does what our questioner asks: `#!/usr/bin/awk -f BEGIN { FS = "\t" } FNR == 1 { split("", Fields) # clear fields array for (i = 1; i <= NF; i++) Fields[i] = $i next } { for (i = 1; i <= NF; i++) print $i > Fields[i] }` [download] Save it in a file and mark it executable; tested with GNU Awk. Feed it input on stdin or list the files you want it to read on the command line. If you want to add prefixes or suffixes to the output file names, add them to the `print` statement, like so: `print $i > ("out."Fields[i]".txt")`; the parentheses ensure that the invisible concatenation operator will be parsed correctly.	[reply] [d/l] [select]
Re^4: Split tab-separated file into separate files, based on column name (open on demand) by haukex (Archbishop) on Aug 27, 2020 at 19:19 UTC
Re^5: Split tab-separated file into separate files, based on column name (open on demand) by jcb (Parson) on Aug 28, 2020 at 01:30 UTC
Some notes below your chosen depth have not been shown here
Re^5: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 27, 2020 at 21:52 UTC
Some notes below your chosen depth have not been shown here
Re^4: Split tab-separated file into separate files, based on column name (open on demand) by Anonymous Monk on Aug 27, 2020 at 10:01 UTC
Re^4: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 27, 2020 at 10:16 UTC
Re^3: Split tab-separated file into separate files, based on column name (open on demand) by Eily (Monsignor) on Aug 26, 2020 at 15:26 UTC
This might be considered dirty in a real Perl script but should be acceptable in a one-liner. 100% agree with that sentence (which says a lot, since the sentence is "this might be"). You could use operator overloading to replicate that feature. `"Value" > file("path");` or `"Value" >> file("path")` where file returns an object that overloads > and >> Or you could do something closer to C++: `fstream("path") << 120 << " in hexadecimal is " << ctrl::hex << 120; fstream("logs", "a") << ctrl::autoline << "I'm adding this line to the + logs" << "and also this line";` [download]	[reply] [d/l] [select]
Re^4: Split tab-separated file into separate files, based on column name (open on demand) by tobyink (Canon) on Aug 27, 2020 at 16:40 UTC
Re^5: Split tab-separated file into separate files, based on column name (open on demand) by jo37 (Deacon) on Aug 28, 2020 at 16:37 UTC
Re^4: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 26, 2020 at 15:38 UTC
Re^4: Split tab-separated file into separate files, based on column name (tangent = open on demand => stream-like) by pryrt (Abbot) on Aug 26, 2020 at 17:32 UTC
Re^5: Split tab-separated file into separate files, based on column name (tangent = open on demand => stream-like) by LanX (Saint) on Aug 26, 2020 at 18:06 UTC
Some notes below your chosen depth have not been shown here
Re: Split tab-separated file into separate files, based on column name by tybalt89 (Monsignor) on Aug 26, 2020 at 13:29 UTC
`#!/usr/bin/perl use strict; #https://perlmonks.org/?node_id=11121090 use warnings; my @handles = map { open my $fh, '>', "tmp.$_" or die; $fh } split /\t\|\n/, <DATA>; while( <DATA> ) { my @data = split /\t\|\n/; print { $handles[$_] } $data[$_], "\n" for 0 .. $#handles; } close $_ or die for @handles; __DATA__ id name position 1 Nick boss 2 George CEO 3 Christina CTO` [download]	[reply] [d/l]
Re: Split tab-separated file into separate files, based on column name by LanX (Saint) on Aug 26, 2020 at 11:36 UTC
> so there must be a more clever way :) You want a one liner and I doubt it'll be very readable. The clever way is to `split` the head line and to `open` files for each entry and to hold the filehandles in an array. Now you can `print` each field by column position after splitting the remaining lines. That's a dozen code lines at most. .. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re: Split tab-separated file into separate files, based on column name by Corion (Patriarch) on Aug 26, 2020 at 11:52 UTC
See part - split up files according to column value and/or App::part for a program doing that.	[reply]
Re^2: Split tab-separated file into separate files, based on column name by Anonymous Monk on Aug 26, 2020 at 11:58 UTC
Will "part" split the file vertically? Because, in my example, the desired output would be: `* FILE "id" with values 1 2 3 * File "name" with values Nick George Christina * File "position" with values boss CTO CEO` [download]	[reply] [d/l]
Re^3: Split tab-separated file into separate files, based on column name by Corion (Patriarch) on Aug 26, 2020 at 11:59 UTC
Oh - sorry, no - this is for splitting a file horizontally according to a column value, not vertically.	[reply]
Re: Split tab-separated file into separate files, based on column name by LanX (Saint) on Aug 27, 2020 at 16:19 UTC
Here a pure Perl one-liner, please note that the files are named after the column heads that I use Windows quoting rules `D:\tmp>del id,name,position D:\tmp>perl -lanE "if (@FH) {print $_ shift @F for @FH} else {open $FH +[$x++], '>', $_ for @F}" data.txt D:\tmp>type data.txt, id,name,position data.txt id name position 1 Nick boss 2 George CEO 3 Christina CTO id 1 2 3 name Nick George Christina position boss CEO CTO` [download] UPDATE eliminated bug Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re^2: Split tab-separated file into separate files, based on column name (updated) by LanX (Saint) on Aug 27, 2020 at 16:36 UTC
ACHTUNG The following code is buggy, sorry. ... it will create empty files for each field ... `D:\tmp>type 1,George,CEO 1 George CEO D:\tmp>` [download] strange behavior... (Update: see solution here ) I didn't expect this, but Perl seems to silently refuse to re-open an already open file handle so if you don't mind having the column head included you can go even shorter `D:\tmp>del id,name,position D:\tmp>perl -lanE "open $FH[$x++], '>', $_ for @F;print $_ shift @F f +or @FH" data.txt D:\tmp>type id,name,position id id 1 2 3 name name Nick George Christina position position boss CEO CTO D:\tmp>` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^3: Split tab-separated file into separate files, based on column name (updated) by jcb (Parson) on Aug 28, 2020 at 01:17 UTC
Try changing `'>'` to `'>>'`. If I remember correctly, open will silently reopen an already open handle. Since you are using truncating write mode, each file gets truncated every time it is opened.	[reply] [d/l] [select]
Re^4: Split tab-separated file into separate files, based on column name (solved) by LanX (Saint) on Aug 28, 2020 at 07:25 UTC
Re^5: Split tab-separated file into separate files, based on column name (solved) by jcb (Parson) on Aug 28, 2020 at 23:26 UTC
Some notes below your chosen depth have not been shown here


We don't bite newbies here... much
	PerlMonks

Split tab-separated file into separate files, based on column name

UPDATE

ACHTUNG