Re: Table manipulation, array or hash?

I took a somewhat similar approach to the other reponders and used a Hash of Hashes (HoH) and combined the two data values for each entry into a string which could latter be split() so as to avoid the third level of deep data structure.

My approach (with just the principle code snippet) is shown below:


my %myHash = ();
my %tempHash = ();
foreach (@lines){
   my($key1,$key2,$val1,$val2,$rest) = split(/\s+/,$_,5);
   my $combinedValue = sprintf("%2s,%2s",$val1,$val2);
   $key1 =~ /SNP(\d+)/;  my $indx = $1;
   if(exists $myHash{$key2}){
      %tempHash = %{$myHash{$key2}};
      $tempHash{$indx} = $combinedValue;
      $myHash{$key2} = {%tempHash};
   } else {
      $tempHash{$indx} = $combinedValue;
      $myHash{$key2} = {%tempHash};
   }
}

foreach my $key (sort keys %myHash){
   my %tempHash2 = %{$myHash{$key}};
   my $line2output = "$key ";
   foreach my $sortedKey (sort keys %tempHash2){
      $line2output .=
         sprintf(" %2s %2s",split(',',$tempHash2{$sortedKey}));
   }
   print "$line2output\n";
}
[download]

I have also put the OP's example data input into an array, @lines, to simplify my testing. Assuming that the lines are being read in from a file, one would do a foreach (<INPUT>){} rather than my foreach (@lines){} structure.

I hope this helps and shows yet another approach that works. I didn't spend a lot of time optimizing or simplifying. I figure that is a worthwhile exercise for the reader and the OP.

ack Albuquerque, NM

Comment on Re: Table manipulation, array or hash? Select or Download Code

Replies are listed 'Best First'.

Re^2: Table manipulation, array or hash?
by GrandFather (Saint) on Mar 23, 2010 at 20:14 UTC

one would do a foreach (<INPUT>){}

No one wouldn't. One might do while (<$inFile>) {...} however. Perl for loops like to work with lists of things and will generally create a list (except in a few special cases) which in the code you suggested would slurp the entire file into memory - something that should generally be avoided.

That aside, I find your sample code very 'busy' with repeated code and needless (and poorly named) variables. Contrast it with the following:

my %dataHash;

foreach (@lines) {
    my ($key1, $key2, $val1, $val2, $rest) = split(/\s+/, $_, 5);
    my $combinedValue = "$val1,$val2";
    $key1 =~ /SNP(\d+)/;

    $dataHash{$key2}{$1} = $combinedValue;
}

foreach my $key (sort keys %dataHash) {
    my %tempHash2 = %{$dataHash{$key}};

    print $key;
    printf(" %2s %2s", split(',', $tempHash2{$_})) for sort keys %temp
+Hash2;
    print "\n";
}
[download]

In a teaching context it is desirable to present the cleanest code you can and to demonstrate best practises. Worthwhile exercises for the reader generally entail extending the code in various ways - not in trying to compensate for the sample's deficiencies.

True laziness is hard work

[reply]
[d/l]
[select]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks