Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

reformatting tab delimited file

by garyboyd (Acolyte)
on Sep 05, 2011 at 14:54 UTC ( #924257=perlquestion: print w/ replies, xml ) Need Help??
garyboyd has asked for the wisdom of the Perl Monks concerning the following question:

Hi, could someone give me pointers/pseudocode for converting a tab-delimited file into a different format?

My data looks like this:

Name_1 TT XL_927799.1 Name_1 PA PA_392 Name_1 AT ZX_003039195.1 Name_2 TT XL_931313.1 Name_2 AT ZX_003043016.1 Name_3 TT XL_929616.1 Name_3 PA PA_5040 Name_3 PA PA_6336 Name_4 TT XL_928294.1 Name_4 PA PA_917

And I want to get it into this format:

PA TT AT PA_392 XL_927799.1 ZX_003039195.1 XL_931313.1 ZX_003043016.1 PA_5040,PA_6336 XL_929616.1 PA_917 XL_928294.1

The information in each row is separated by tabs and any field containing more than one eg Name_3 has two PA entries, will produce both entries separated by a comma.

If there is no entry a blank should be left or I suppose "No corresponding entry"

Thanks

Comment on reformatting tab delimited file
Select or Download Code
Re: reformatting tab delimited file
by davido (Archbishop) on Sep 05, 2011 at 15:00 UTC

    Create a hash such as %categories. Iterate over the lines of your list. For each line, split on whitespace, then push @{$categories{$second_column}}, $third_column;

    I'm assuming you know how to open a file and read from it. A while loop will be helpful in iterating over each line. Don't forget to chomp.

    Output should just be a matter of obtaining the lists held under each hash key and printing them side by side. Another loop with some logic to print a placeholder instead of an item for a given column when one column runs out of entries while others still have entries.


    Dave

Re: reformatting tab delimited file
by Anonymous Monk on Sep 05, 2011 at 15:00 UTC
Re: reformatting tab delimited file
by Cristoforo (Deacon) on Sep 05, 2011 at 23:24 UTC
    Text::Table will align your output. Here is a sample program. Also, I used Sort::Naturally so that names with trailing digits will sort correctly, i.e. when they are greater than 1 digit long.

    Update: in while loop, changed from split on space to split on tabs because thats how the fields are separated.

    #!/usr/bin/perl use strict; use warnings; use Text::Table; use Sort::Naturally; my %data; my @col2 = qw/ PA TT AT /; while (<DATA>) { chomp; my ($name, $col2, $col3) = split /\t/; push @{ $data{$name}{$col2} }, $col3; } my $tb = Text::Table->new( map {title => $_}, @col2); for my $name (nsort keys %data) { my @tmp; local $" = ','; for my $col2 (@col2) { push @tmp, $data{$name}{$col2} ? "@{ $data{$name}{$col2} }" : ""; } $tb->load(\@tmp); } print $tb; __DATA__ Name_1 TT XL_927799.1 Name_1 PA PA_392 Name_1 AT ZX_003039195.1 Name_2 TT XL_931313.1 Name_2 AT ZX_003043016.1 Name_3 TT XL_929616.1 Name_3 PA PA_5040 Name_3 PA PA_6336 Name_4 TT XL_928294.1 Name_4 PA PA_917

    This prints:

    PA TT AT PA_392 XL_927799.1 ZX_003039195.1 XL_931313.1 ZX_003043016.1 PA_5040,PA_6336 XL_929616.1 PA_917 XL_928294.1

      Thanks for everybody's suggestions, the solution provided by Cristoforo works brilliantly!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://924257]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2014-12-20 08:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls