http://www.perlmonks.org?node_id=1027292

david_lyon has asked for the wisdom of the Perl Monks concerning the following question:

I want to convert the following matrix below Format 1), into the format in 2) with 4 column format going downwards, I am using the following perl code to do it and it works but is ugly. Can someone show me how is the proper way of doing it in perl. Thank You. Dave

Format 1)
               A     C     F
SampleID        Time    ObsConc ObsConc ObsConc
5        24      2.27E+06        687.02  32521.94
5        168     1.92E+06        525.02  22198.44
5        12      2.94E+06        896.39  41331.61
5        -0.5    2.23E+06        942.34  40616.49
5        8       4.03E+06        1371.32 45863.69
6        24      1.02E+06        1057.89 46341.04
6        168     3.14E+06        4987.32 42166.08
Format 2)
A     5        24     2.27E+06
A     5        168    1.92E+06
A     5        12     2.94E+06
A     5        -0.5   2.23E+06
A     5        8      4.03E+06
A     6        24     1.02E+06
A     6        168   ...
C     5        24    ...
C     5        168    ...
C     5        12     ...
C     5        -0.5   ...
C     5        8      ...
C     6        24    ...
C     6        168    ...
F     5        24     ...
F     5        168    ...
F     5        12     ...
F     5        -0.5   ...
F     5        8      ...
F     6        24     ...
F     6        168    ...
etc...
foreach (3..$count_columns){ $header=`cut -f1,2,$_ $file | head -n1`; chomp $header; $cmd="cut -f1,2,$_ $file | grep -P \'^\\w+\' | grep -P -v \'^Sample +\' > $file\.2"; system($cmd); $cmd="perl -pi -e s\'/^/$header\\t/g\' $file\.2"; system($cmd); }

Replies are listed 'Best First'.
Re: format conversion, please help
by hdb (Monsignor) on Apr 06, 2013 at 19:09 UTC

    How about this:

    use strict; use warnings; my $letters = <DATA>; my @letters = $letters =~ /\w+/g; my %out; while(<DATA>) { next if /SampleID/; chomp; my @recs = split /\s+/; my $ctr = 2; for my $letter (@letters) { $out{$letter} .= "$letter\t".(join "\t", @recs[0,1,$ctr++])."\n"; } } for my $letter (@letters) { print $out{$letter}; } __DATA__ A C F SampleID Time ObsConc ObsConc ObsConc 5 24 2.27E+06 687.02 32521.94 5 168 1.92E+06 525.02 22198.44 5 12 2.94E+06 896.39 41331.61 5 -0.5 2.23E+06 942.34 40616.49 5 8 4.03E+06 1371.32 45863.69 6 24 1.02E+06 1057.89 46341.04 6 168 3.14E+06 4987.32 42166.08

    This assumes that the line with A C F is potentially variable and due to change. Otherwise it could be further simplified by hardcoding it. It also assumes that you always repeat the first two columns.

    It builds the output fully in memory, which might be a problem if you have HUGE input. In order to minimize memory consumption, one could use temporary files for each of the letters and merge them later.

    UPDATE: No statement about the "ugliness" of this or other code is implied...

      This works very nicely and is good looking perl code... Thanks!

        Another thought: if the order of lines in the output does not matter it would be simpler.

        use strict; use warnings; my $letters = <DATA>; my @letters = $letters =~ /\w+/g; while(<DATA>) { next if /SampleID/; chomp; my @recs = split /\s+/; my $ctr = 2; for my $letter (@letters) { print "$letter\t".(join "\t", @recs[0,1,$ctr++])."\n"; } } __DATA__ A C F SampleID Time ObsConc ObsConc ObsConc 5 24 2.27E+06 687.02 32521.94 5 168 1.92E+06 525.02 22198.44 5 12 2.94E+06 896.39 41331.61 5 -0.5 2.23E+06 942.34 40616.49 5 8 4.03E+06 1371.32 45863.69 6 24 1.02E+06 1057.89 46341.04 6 168 3.14E+06 4987.32 42166.08
Re: format conversion, please help
by ww (Archbishop) on Apr 06, 2013 at 19:04 UTC

    If "ugly is as ugly does," your code ain't ugly.

    That's not to say you're wrong is assuming there are more perl-ish ways, especially including those that don't require recourse to a system utility, line 5, "$header=`cut -f1,2,$_ ...." But whether those are more effective, more elegant or less "ugly" is pretty much in the eye of the beholder... and what amounts to self-deprecation is far from obligatory. Remember,

    TIMTOWTDI

    Maybe you'll consider labeling your issues with code that works, as a search for other ways to do it.


    If you didn't program your executable by toggling in binary, it wasn't really programming!

Re: format conversion, please help
by Kenosis (Priest) on Apr 06, 2013 at 20:39 UTC

    And here's one more (perhaps "ugly" :) option:

    use strict; use warnings; my @col1 = split ' ', <DATA>; <DATA>; my @rows = map { my @row = split; \@row } <DATA>; my $i = 2; for my $col (@col1) { print "$col\t$_->[0]\t$_->[1]\t$_->[$i]\n" for @rows; $i++; } __DATA__ A C F SampleID Time ObsConc ObsConc ObsConc 5 24 2.27E+06 687.02 32521.94 5 168 1.92E+06 525.02 22198.44 5 12 2.94E+06 896.39 41331.61 5 -0.5 2.23E+06 942.34 40616.49 5 8 4.03E+06 1371.32 45863.69 6 24 1.02E+06 1057.89 46341.04 6 168 3.14E+06 4987.32 42166.08

    Output:

    A 5 24 2.27E+06 A 5 168 1.92E+06 A 5 12 2.94E+06 A 5 -0.5 2.23E+06 A 5 8 4.03E+06 A 6 24 1.02E+06 A 6 168 3.14E+06 C 5 24 687.02 C 5 168 525.02 C 5 12 896.39 C 5 -0.5 942.34 C 5 8 1371.32 C 6 24 1057.89 C 6 168 4987.32 F 5 24 32521.94 F 5 168 22198.44 F 5 12 41331.61 F 5 -0.5 40616.49 F 5 8 45863.69 F 6 24 46341.04 F 6 168 42166.08

    Hope this helps!

    Edit: My thanks to poj for pointing out that my script didn't properly generate the last column's values. This is fixed.

Re: format conversion, please help
by farang (Chaplain) on Apr 07, 2013 at 00:00 UTC

    I took a different approach, probably less elegant and certainly less flexible than others. But it does seems to work. As the choice of some variable names might indicate ( %HoA $aref ), I am relatively new to perlish ways and am just getting familiar with usage of the various data structures.

    use strict; use warnings; my %HoA; while ( <DATA> ) { next if m/^\D/; # ignore lines not beginning with a digit my ( $SampleID, $Time, $A_ObsConc, $C_ObsConc, $F_ObsConc ) = unpack("A9 A8 A15 A9 A*", $_); push @{ $HoA{A} } , [ $SampleID, $Time, $A_ObsConc ]; push @{ $HoA{C} } , [ $SampleID, $Time, $C_ObsConc ]; push @{ $HoA{F} } , [ $SampleID, $Time, $F_ObsConc ]; } for my $key ( sort keys %HoA ) { for my $aref ( @{ $HoA{$key} } ) { printf("%s %6s %9s %9s\n", $key, @$aref ); } } __DATA__ A C F SampleID Time ObsConc ObsConc ObsConc 5 24 2.27E+06 687.02 32521.94 5 168 1.92E+06 525.02 22198.44 5 12 2.94E+06 896.39 41331.61 5 -0.5 2.23E+06 942.34 40616.49 5 8 4.03E+06 1371.32 45863.69 6 24 1.02E+06 1057.89 46341.04 6 168 3.14E+06 4987.32 42166.08