Okay, I think I've made some progress, but I'm still not quite there yet. Here's what I now have:
#!/usr/bin/perl
use warnings;
use strict;
use v5.14;
use Getopt::Long;
use Bio::PopGen::IO;
use Bio::PopGen::Statistics;
die "need two arguments (i.e. chr cont) at invocation" unless @ARGV ==
+ 2;
chomp( my $chr_num = shift );
chomp( my $cont = shift );
open my $out_file, ">", "chr${chr_num}_exome_snps_processed_${cont}_ST
+ATS"
or die "Can't open output file: $!\n";
open my $in_file, "<", "chr${chr_num}_exome_snps_processed_$cont"
or die "Can't open input file: $!\n";
my %data;
my @snp_bins;
my @individuals;
my @all_snps;
while (<$in_file>) {
chomp;
if (/^SAMPLE/) {
my ( $placeholder, @coords ) = split /,/;
foreach my $coord (@coords) {
push @snp_bins, int( $coord / 100_000 );
}
}
else {
my ( $id, @snps ) = split /,/;
push @individuals, $id;
push @all_snps[$. - 2], join(',', @snps);
}
}
foreach my $individual (@individuals) {
foreach my $index ( 0 .. $#snp_bins ) {
push( @{ $data{$individual}[ $snp_bins[$index] ] }, $all_snps[
+$index] );
}
}
close $in_file;
But there's still (at least) a problem with the line
push @all_snps[$. - 2], join(',', @snps);
I hope I'm otherwise headed in the right direction..?
In regard to what I will do with undefined bins: I will iterate through all the bins, and any that don't have a minimum number of elements simply won't be passed as data to the bioperl popgen stats methods, later on in the program.