I'm not 100% sure what you want to do here Jeri. The sample below assumes you have a single string composed of 9 character sequences. When the 'for' has run the keys of %uniq will be the unique 9 character sequences.
It would take a while and a lot of memory over 5 mill x 9 char sequences!
my $str = join '',(648040620,637132715,649986572,648040620 );
my $proteins_count = length ($str)/9;
my %uniq;
do { $uniq{$_}++ unless $uniq{$_} }
for unpack "(A9)$proteins_count" , $str;
print "@{[keys %uniq]}\n";
Prints ..
637132715 648040620 649986572
|