Best way to read in an XbyX table into a Hash{Key}{Key2}[value] structure

ZWcarp has asked for the wisdom of the Perl Monks concerning the following question:

Hello brothers,

I'm trying to figure out the best way to load a probability table into perl. Heres my code which seems to work but there has to be a better and easier way to do this!

#!/usr/bin/perl -w
use strict;
use warnings;
use diagnostics;
use Data::Dumper;
##########################

my %hash;
my %hash2;
my @a1;

my $header_line = <>;
my @headers = split(/\t/,$header_line);
my $index=0;
my %col_header = map { $_ => $index++} @headers[1..$#headers];
#
#print Dumper \%col_header;
while (<>) {
    chomp;
    @a1=split(/\t/,$_);
    my $fromAA=shift(@a1);
    foreach my $toAA (keys %col_header){
        $hash{$fromAA}{$toAA} = $a1[$col_header{$toAA}];
    }
                
}
print Dumper \%hash;
[download]

And here is an example set of data in tab separated format (c/p from excel sheet)

Amino Acid Switch Probabiities        AAA       AAC       AAG       AA
+T       ACA       ACC       ACG       ACT       AGA       AGC       A
+GG       AGT       ATA       ATC       ATG       ATT
AAA    0.40849    0.01506    0.26198    0.01904    0.01527    0.0065  
+  0.01149    0.00774    0.09164    0.00886    0.06529    0.01076    0
+.0066    0.00143    0.00546    0.00199
AAC    0.011    0.41485    0.00959    0.25289    0.00916    0.01865   
+ 0.0096    0.01375    0.00594    0.06004    0.00586    0.04124    0.0
+0198    0.00335    0.00212    0.00227
AAG    0.29591    0.01484    0.46315    0.01686    0.01135    0.00675 
+   0.01565    0.00657    0.06807    0.00889    0.09736    0.00994    
+0.00415    0.00151    0.00852    0.00177
AAT    0.0123    0.22372    0.00965    0.3545    0.00971    0.01103   
+ 0.00955    0.01893    0.00611    0.03574    0.00553    0.06048    0.
+00225    0.00204    0.00231    0.00365
ACA    0.00913    0.0075    0.006    0.00899    0.25029    0.13459    
+0.19326    0.15274    0.00817    0.0142    0.00526    0.01523    0.01
+986    0.00584    0.01536    0.00737
ACC    0.00368    0.01445    0.00338    0.00966    0.12735    0.27128 
+   0.15326    0.14524    0.00311    0.02754    0.00306    0.01691    
+0.00817    0.01213    0.00809    0.00755
ACG    0.0024    0.00274    0.00289    0.00309    0.06746    0.05654  
+  0.0985    0.05631    0.0019    0.00596    0.00256    0.00568    0.0
+0401    0.00227    0.00719    0.00266
ACT    0.00395    0.0096    0.00296    0.01494    0.13029    0.13094  
+  0.1376    0.2089    0.00312    0.01788    0.00272    0.02573    0.0
+0854    0.00653    0.00841    0.01267
AGA    0.04383    0.00389    0.02882    0.00452    0.00654    0.00263 
+   0.00435    0.00293    0.28665    0.00702    0.17898    0.00863    
+0.00367    0.00076    0.00259    0.00093
AGC    0.00598    0.05551    0.00531    0.03735    0.01604    0.03288 
+   0.01929    0.02368    0.00991    0.3642    0.01051    0.23871    0
+.00275    0.00449    0.0028    0.00327
AGG    0.0274    0.00337    0.03617    0.00359    0.00369    0.00227  
+  0.00515    0.00224    0.15703    0.00653    0.25775    0.00725    0
+.00203    0.00066    0.00346    0.00076
AGT    0.00519    0.02725    0.00425    0.04518    0.0123    0.01443  
+  0.01315    0.02436    0.00871    0.17062    0.00833    0.28774    0
+.00236    0.0019    0.00217    0.00399
ATA    0.00219    0.0009    0.00122    0.00115    0.01102    0.00479  
+  0.00637    0.00556    0.00254    0.00135    0.0016    0.00162    0.
+19608    0.0846    0.02113    0.08774
ATC    0.00101    0.00324    0.00095    0.00224    0.00691    0.01517 
+   0.0077    0.00905    0.00112    0.0047    0.00111    0.00279    0.
+18032    0.36846    0.02111    0.22364
ATG    0.00423    0.00224    0.00584    0.00277    0.01992    0.01108 
+   0.02671    0.01278    0.00419    0.00321    0.00639    0.00348    
+0.04935    0.02313    0.58867    0.0273
ATT    0.00123    0.00193    0.00097    0.0035    0.00763    0.00826  
+  0.00788    0.01538    0.0012    0.003    0.00113    0.00512    0.16
+368    0.19576    0.02181    0.31668
[download]

Comment on Best way to read in an XbyX table into a Hash{Key}{Key2}[value] structure Select or Download Code

Replies are listed 'Best First'.
Re: Best way to read in an XbyX table into a Hash{Key}{Key2}[value] structure by rjt (Curate) on Aug 07, 2013 at 19:01 UTC
(c/p from excel sheet) Given that, you may want to consider Spreadsheet::ParseExcel to read the data in directly from your Excel spreadsheet. Edit: Thanks to the OP for helping me understand the requirements. I now believe this will be much more in line with what you're after: `my (undef, @col) = split /\t/, <>; # Column names my %prob_map; while (my ($from, @values) = split /\t/, <>) { @{$prob_map{$from}}{@col} = @values; } say "AAC to AAA = " . $prob_map{AAC}{AAA}; say "AAA to AAC = " . $prob_map{AAA}{AAC}; __END__ AAC to AAA = 0.011 AAA to AAC = 0.01506` [download] Previous line-based (i.e., row major) suggestion is below. Otherwise, parsing the plain text you've provided is fairly straightforward as well: `my @col = split /\t/, <>; # Column headings my @lines = map { my %l; @l{@col} = split /\t/; \%l } <>;` [download] However, whether that's actually an improvement on your code or not is debatable. :-)	[reply] [d/l] [select]
Re^2: Best way to read in an XbyX table into a Hash{Key}{Key2}[value] structure by ZWcarp (Beadle) on Aug 07, 2013 at 19:37 UTC
Thank you for your help! So its not usually from excel sheets, I just listed that in case there was excel markup leftover ..like \r returns or something. Question though....I don't understand how I then access the information following your method. For example, if I wanted the probability of a AAC to AAA (.0011) how do I get to that value? You've created an array where each cell has the ref to a hash....which in turn holds the key value pairs of each row with respect to that column... correct? How do I access the info for individual cases?	[reply]
Re^3: Best way to read in an XbyX table into a Hash{Key}{Key2}[value] structure by rjt (Curate) on Aug 07, 2013 at 20:01 UTC
Thanks for the follow-up. I believe I now understand your requirements. Since my original solution was almost certainly not what you were after, I added a better one to my original node, which will make it possible to do the from/to lookups you need. Mea culpa; it's been a while since I looked at a probability table. :-)	[reply]
Re^2: Best way to read in an XbyX table into a Hash{Key}{Key2}[value] structure by Cristoforo (Curate) on Aug 07, 2013 at 20:17 UTC
`my (undef, @col) = split /\s+/, <DATA>; # Column names` When splitting on `\s+` instead of `\t`, the first header value, 'Amino Acid Switch Probabiities', will be wrongly split into the `@col` array. You need to split on tabs to prevent this from happening.	[reply] [d/l] [select]
Re^3: Best way to read in an XbyX table into a Hash{Key}{Key2}[value] structure by rjt (Curate) on Aug 07, 2013 at 22:09 UTC
Quite right. My local test copy of the sample data was based on the original OP (which had the tabs squashed to spaces), and I forgot to update my `split` pattern when posting. Corrected now, thanks.	[reply] [d/l]
Re: Best way to read in an XbyX table into a Hash{Key}{Key2}[value] structure by Cristoforo (Curate) on Aug 07, 2013 at 19:54 UTC
Maybe not better, but a little shorter. `#!/usr/bin/perl use strict; use warnings; chomp(my (undef, @headers) = split /\t/, <>); my %hash; while (<>) { chomp; my ($fromAA, @a1) = split /\t/; @{ $hash{$fromAA} }{@headers} = @a1; }` [download]	[reply] [d/l]

Back to Seekers of Perl Wisdom