http://www.perlmonks.org?node_id=924257

garyboyd has asked for the wisdom of the Perl Monks concerning the following question:

Hi, could someone give me pointers/pseudocode for converting a tab-delimited file into a different format?

My data looks like this:

Name_1 TT XL_927799.1 Name_1 PA PA_392 Name_1 AT ZX_003039195.1 Name_2 TT XL_931313.1 Name_2 AT ZX_003043016.1 Name_3 TT XL_929616.1 Name_3 PA PA_5040 Name_3 PA PA_6336 Name_4 TT XL_928294.1 Name_4 PA PA_917

And I want to get it into this format:

PA TT AT PA_392 XL_927799.1 ZX_003039195.1 XL_931313.1 ZX_003043016.1 PA_5040,PA_6336 XL_929616.1 PA_917 XL_928294.1

The information in each row is separated by tabs and any field containing more than one eg Name_3 has two PA entries, will produce both entries separated by a comma.

If there is no entry a blank should be left or I suppose "No corresponding entry"

Thanks

Replies are listed 'Best First'.
Re: reformatting tab delimited file
by davido (Cardinal) on Sep 05, 2011 at 15:00 UTC

    Create a hash such as %categories. Iterate over the lines of your list. For each line, split on whitespace, then push @{$categories{$second_column}}, $third_column;

    I'm assuming you know how to open a file and read from it. A while loop will be helpful in iterating over each line. Don't forget to chomp.

    Output should just be a matter of obtaining the lists held under each hash key and printing them side by side. Another loop with some logic to print a placeholder instead of an item for a given column when one column runs out of entries while others still have entries.


    Dave

Re: reformatting tab delimited file
by Cristoforo (Curate) on Sep 05, 2011 at 23:24 UTC
    Text::Table will align your output. Here is a sample program. Also, I used Sort::Naturally so that names with trailing digits will sort correctly, i.e. when they are greater than 1 digit long.

    Update: in while loop, changed from split on space to split on tabs because thats how the fields are separated.

    #!/usr/bin/perl use strict; use warnings; use Text::Table; use Sort::Naturally; my %data; my @col2 = qw/ PA TT AT /; while (<DATA>) { chomp; my ($name, $col2, $col3) = split /\t/; push @{ $data{$name}{$col2} }, $col3; } my $tb = Text::Table->new( map {title => $_}, @col2); for my $name (nsort keys %data) { my @tmp; local $" = ','; for my $col2 (@col2) { push @tmp, $data{$name}{$col2} ? "@{ $data{$name}{$col2} }" : ""; } $tb->load(\@tmp); } print $tb; __DATA__ Name_1 TT XL_927799.1 Name_1 PA PA_392 Name_1 AT ZX_003039195.1 Name_2 TT XL_931313.1 Name_2 AT ZX_003043016.1 Name_3 TT XL_929616.1 Name_3 PA PA_5040 Name_3 PA PA_6336 Name_4 TT XL_928294.1 Name_4 PA PA_917

    This prints:

    PA TT AT PA_392 XL_927799.1 ZX_003039195.1 XL_931313.1 ZX_003043016.1 PA_5040,PA_6336 XL_929616.1 PA_917 XL_928294.1

      Thanks for everybody's suggestions, the solution provided by Cristoforo works brilliantly!

Re: reformatting tab delimited file
by Anonymous Monk on Sep 05, 2011 at 15:00 UTC