in reply to Population of HoAoA based on file contents
Thanks! I rewrote my code following your advice. I used Data::Printer to check it and I think I've finally got my data into the structure that I wanted.
In the meantime, I've been trying to work out how to process my data from the HoAoA. What I want to do is pass the first interior array (as a block) for every individual to the module methods, then pass the second array in the same way, and so on for each of the array positions in turn. The goal is to treat each array position as a separate window for the purposes of analysis. Just to make it clear, here is a simplified version of my structure:
my %hash; $hash{key1} = [ [0 0, 0 1, 0 0, 1 1, 0 0, 0 1,], # first block [0 0, 0 0, 0 0, 0 1, 0 1, 0 0,], # second block [0 1, 0 0, 0 0, 0 1, 0 0, 0 0,], # third block ]; $hash{key2} = [ [0 0, 0 0, 0 0, 0 1, 1 1, 0 0,], # first block [0 0, 0 1, 0 0, 0 0, 0 0, 0 0,], # second block [0 0, 1 1, 0 0, 1 1, 0 0, 1 0,], # third block ];
In the above case, I'd want to pass to the module methods (shown in the code below) the arrays labeled '#first block' for both individuals first, then those labeled '#second block', and finally those labeled '#third block', so that I get separate statistics for each genomic region.
I have been trying to do this on my actual data. So far, I have produced the following (broken) code:
my $id; # ?? foreach my $window ( @{ $data{$id} } ) { # only pass windows with > min number of stats (i.e. >1 for now), an +d # of course we also ignore any 'undef' windows. next unless scalar @{ $data{$id} }[$window] >= 2; foreach my $individual ( keys(%data) ) { my $segsites = Bio::PopGen::Statistics->segregating_sites_count( \ +@{ $data{$id} }[$window] ); my $singletons = Bio::PopGen::Statistics->singleton_count( \@{ $da +ta{$id} }[$window] ); my $pi = Bio::PopGen::Statistics->pi( \@{ $data{$id} }[$window] ); my $theta = Bio::PopGen::Statistics->theta( \@{ $data{$id} }[$wind +ow] ); my $tajima_D = Bio::PopGen::Statistics->tajima_D( \@{ $data{$id} } +[$window] ); my $D_star = Bio::PopGen::Statistics->fu_and_li_D_star( \@{ $data{ +$id} }[$window] ); my $F_star = Bio::PopGen::Statistics->fu_and_li_F_star( \@{ $data{ +$id} }[$window] ); # output follows: say $out_file "Population: $cont\tChromosome: $chr_num"; say $out_file "Seg sites\tSingletons\tPi\tTheta\tTajima's D\tFu & +Li F*\tFu & Li D*"; # to account for invariant sites, divide stats by window size (i.e +. 100_000) my @stats = qw($segsites $singletons $pi $theta $tajima_D $F_star +$D_star); foreach my $stat (@stats) { print $out_file $stat / 100_000, "\t" } print $out_file "\n"; } }
This code produces all manner of errors, and I don't really know if my entire approach is incorrect, or if it's just that I don't understand the Perl syntax for working with complex data structures. The various pages on perldoc haven't really helped much. I'd really appreciate some hints on how to proceed with this.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Population of HoAoA based on file contents
by state-o-dis-array (Hermit) on May 22, 2012 at 22:24 UTC | |
by iangibson (Scribe) on May 24, 2012 at 19:53 UTC | |
by state-o-dis-array (Hermit) on May 25, 2012 at 14:35 UTC | |
by iangibson (Scribe) on May 25, 2012 at 17:05 UTC | |
by iangibson (Scribe) on Jun 01, 2012 at 16:08 UTC | |
|