http://www.perlmonks.org?node_id=889535

$new_guy has asked for the wisdom of the Perl Monks concerning the following question:

I have tried working on this for the past several days. I have a test_data.txt file that I would like to reorganize. The data in the file are arranged in rows, each row representing a distinct cluster/group. I would like to re-arrange the data in the file so that all entries with the same prefix are in one column. The cluster IDs help to identify the data entries in the rows.

The data currently looks like this:

ClusterX a_123(something) b_675(some_other_thing) b_234(something new +) c_897(some different thing) ClusterY b_6998(some_other_thing, thats new) c_877797(something diff +erent inside here) c_111(some other different thing) ClusterZ a_1234(something interesting) a_123467(something - else thats + is - interesting) 3850-1-2_12243(a new one) 3850-1-2_1789(another n +ew one)

The desired format is:

ClusterX a_123(something) b_675(some_other_thing) c_897(some differen +t thing) - ClusterX - b_234(something, new) - + - ClusterY - b_6998(some_other_thing, thats new) c_87779 +7(something: different inside here) ClusterY - - c_111(s +ome other different thing) ClusterZ a_1234(something interesting) - - + - ClusterZ a_123467(something - else thats is - interesting) - + - 3850-1-2_12243(a new one) ClusterZ - - - + 3850-1-2_1789(another new one)

Please note that the thge prefix is eveything before the underscore (ie _). Note: But not the one inside the brackets if there is one.

The script I am using is below, I think the problem is at the split function (line 17). Is this right?

#!usr/bin/perl use strict; use warnings; use IO::String; use List::Util 'max'; my $FILENAME4 = "test_data.txt"; open(DATA, $FILENAME4); #create arrays and hashes to store stuff my (%data, %all, @keys); while (<DATA>) { # avoid \n on last field chomp; #split the data into chunks my @chunks = split(/\s{2,}/, <DATA>); ## make sure you don't split + inside annotated brackets #create keys for the chunks my $key = shift @chunks; #store the keys in an array unless they already exist push @keys, $key unless exists $data{$key}; foreach my $chunk (@chunks) { #return references using hashes $data{$key}{$chunk}++; #add all chunks to the hash '%all' $all{$chunk} = 1; } } #remove new_clusters2.txt if it exists my $remove2 = "new_clusters2.txt"; if (unlink($remove2) == 1) { print "Existing \"new_clusters2.txt\" file was + removed\n"; } #now make a file for the ouput my $outputfile = "new_clusters2.txt"; if (! open(POS, ">>$outputfile") ) { print "Cannot open file \"$outputfile\" to write to!!\n\n" +; exit; } #sort the fields/columns keys and save them as an array #my @fields = sort {$a <=> $b} keys %all; ##<--this sorting didn't wor +k my @fields = sort {lc($a) cmp lc($b)} keys %all; #find the longest entry in an array my @array2 = (); foreach my $e (@fields){ #### my $d = $e; $d =~ m/(\S+)\_/; my $prefix = $1; # print "Prefix: ". $prefix."_"."\n\n"; ## prints the prefices push(@array2, $1); # print "* @array2 \n"; ## prints the prefices_ } #### my $longest = max map {length} @array2; #organise the data foreach my $key (@keys) { while (keys %{$data{$key}}) { print POS $key, " "; foreach my $field (@fields) { if ($data{$key}{$field}){ printf POS "%${longest}s ", $field; delete $data{$key}{$field} unless --$data{$key}{$field +}; } else { printf POS "%${longest}s ", "-"; } } print POS "\n"; } }

My test data (just 5 clusters):

Cluster5 SP_1003(conserved hypothetical protein) SP_1174(conserved + domain protein) SP_1175(conserved domain protein) spr_0907(Pne +umococcal histidine triad protein D precursor) spr_1060(Histidine Mo +tif-Containing protein) spr_1061(Pneumococcal histidine triad protei +n A precursor) SPD_0889(pneumococcal histidine triad protein D p +recursor) SPD_1037(histidine triad protein) SPD_1038(pneumococcal h +istidine triad protein A precursor) SP70585_1043(pneumococcal hi +stidine triad protein B) SP70585_1226(pneumococcal histidine triad p +rotein B) SP70585_1227(pneumococcal histidine triad protein B) +SPJ_0944(pneumococcal histidine triad protein B) SPJ_1093(pneumococc +al histidine triad protein B) SPP_1009(pneumococcal histidine tr +iad protein B) SPP_1217(pneumococcal histidine triad protein B) SPP +_1218(pneumococcal histidine triad protein B) SPT_1049(pneumococ +cal histidine triad protein B) SPT_1198(pneumococcal histidine triad + protein B) SPH_1104(pneumococcal histidine triad protein B) + SPG_0928(pneumococcal histidine triad protein D) SPG_1073(pneumoco +ccal histidine triad protein A/B (phtA/B)) SPCG_0977(hypothetica +l protein) SPCG_1122(hypothetical protein) HMPREF0837_11322(pne +umococcal histidine triad protein A/B (phtA/B)) HMPREF0837_11481(pne +umococcal histidine triad protein B) SPN23F_09290(pneumococcal h +istidine triad protein D (bvh-11-2)) SPN23F_10770(putative streptoco +ccal histidine triad protein PhpA) 3850-1-10_00031(unknown) 385 +0-1-10_01193(unknown) 3850-1-10_01345(unknown) 3850-1-11_01204( +unknown) 3850-1-11_01329(unknown) 3850-1-12_00144(unknown) 385 +0-1-12_01345(unknown) 3850-1-1_00282(unknown) 3850-1-1_01281(un +known) 3850-1-1_01443(unknown) 3850-1-2_00010(unknown) 3850-1- +2_01233(unknown) 3850-1-2_01374(unknown) 3850-1-3_01238(unknown +) 3850-1-3_01239(unknown) 3850-1-3_01382(unknown) 3850-1-4_002 +76(unknown) 3850-1-4_01322(unknown) 3850-1-4_01482(unknown) 38 +50-1-5_00019(unknown) 3850-1-5_00023(unknown) 3850-1-5_01247(unknow +n) 3850-1-6_00040(unknown) 3850-1-6_01259(unknown) 3850-1- +7_00013(unknown) 3850-1-7_01232(unknown) 3850-1-7_01359(unknown) + 3850-1-8_00159(unknown) 3850-1-8_01109(unknown) 3850-1-8_01261(u +nknown) 3850-1-9_00252(unknown) 3850-1-9_01523(unknown) 38 +50-2-10_00214(unknown) 3850-2-10_01304(unknown) 3850-2-10_01461(unk +nown) 3850-2-11_01237(unknown) 3850-2-11_01238(unknown) 3850-2 +-11_01361(unknown) 3850-2-11_01362(unknown) 3850-2-12_00145(unk +nown) 3850-2-12_01279(unknown) 3850-2-12_01280(unknown) 3850-2-12_ +01438(unknown) 3850-2-1_01260(unknown) 3850-2-1_01261(unknown) + 3850-2-1_01369(unknown) 3850-2-2_01307(unknown) 3850-2-2_01443 +(unknown) 3850-2-3_01243(unknown) 3850-2-3_01385(unknown) 3850 +-2-3_01386(unknown) 3850-2-4_01492(unknown) 3850-2-4_01636(unkn +own) 3850-2-5_01509(unknown) 3850-2-5_01510(unknown) 3850- +2-6_00415(unknown) 3850-2-6_01357(unknown) 3850-2-6_01358(unknown) + 3850-2-7_01389(unknown) 3850-2-7_01544(unknown) 3850-2-7_01545 +(unknown) 3850-2-8_00293(unknown) 3850-2-8_01225(unknown) 3850 +-2-8_01226(unknown) 3850-2-8_01353(unknown) 3850-2-9_00078(unkn +own) 3850-2-9_01278(unknown) 3850-2-9_01438(unknown) 3850-2-9_0143 +9(unknown) 3850-3-10_01395(unknown) 3850-3-10_01397(unknown) 3 +850-3-10_01495(unknown) 3850-3-11_00190(unknown) 3850-3-11_0019 +1(unknown) 3850-3-11_01194(unknown) 3850-3-12_00207(unknown) 3 +850-3-12_01390(unknown) 3850-3-12_01391(unknown) 3850-3-1_00383 +(unknown) 3850-3-1_01304(unknown) 3850-3-2_01474(unknown) 3850 +-3-2_01635(unknown) 3850-3-3_00053(unknown) 3850-3-3_01170(unkn +own) 3850-3-3_01315(unknown) 3850-3-4_00436(unknown) 3850-3-4_ +01261(unknown) 3850-3-4_01262(unknown) 3850-3-5_00295(unknown) + 3850-3-5_01224(unknown) 3850-3-5_01365(unknown) 3850-3-5_01366(unk +nown) 3850-3-6_01476(unknown) 3850-3-6_02252(unknown) 3850 +-3-7_01192(unknown) 3850-3-7_01324(unknown) 3850-3-8_00224(unkn +own) 3850-3-8_01346(unknown) 3850-3-9_00049(unknown) 3850-3-9_ +01273(unknown) 3850-3-9_01274(unknown) 3850-5-10_00315(unknown) + 3850-5-10_00420(unknown) 3850-5-10_01240(unknown) 3850-5-11_0 +0058(unknown) 3850-5-11_00096(unknown) 3850-5-12_01197(unknown) + 3850-5-12_01198(unknown) 3850-5-12_01314(unknown) 3850-5-12_01315 +(unknown) 3850-5-1_01339(unknown) 3850-5-1_03653(unknown) +3850-5-2_00097(unknown) 3850-5-2_01168(unknown) 3850-5-2_02100(unkn +own) 3850-5-3_00103(unknown) 3850-5-3_00104(unknown) 3850-5-3_ +01284(unknown) 3850-5-3_01285(unknown) 3850-5-4_01200(unknown) + 3850-5-4_01367(unknown) 3850-5-5_01254(unknown) 3850-5-5_01384 +(unknown) 3850-5-5_01385(unknown) 3850-5-6_01108(unknown) 3850 +-5-6_01244(unknown) 3850-5-7_00525(unknown) 3850-5-8_01313( +unknown) 3850-5-8_01458(unknown) 3850-5-9_00357(unknown) 3850- +5-9_01419(unknown) 3850-5-9_01420(unknown) 3850-6-10_01264(unkn +own) 3850-6-10_01402(unknown) 3850-6-11_01121(unknown) 3850-6- +11_01122(unknown) 3850-6-11_01259(unknown) 3850-6-12_00043(unkn +own) 3850-6-12_01214(unknown) 3850-6-12_01367(unknown) 3850-6- +1_00100(unknown) 3850-6-1_01094(unknown) 3850-6-1_01095(unknown) + 3850-6-2_01432(unknown) 3850-6-2_02107(unknown) 3850-6-3_002 +36(unknown) 3850-6-3_01195(unknown) 3850-6-4_01067(unknown) 38 +50-6-4_01201(unknown) 3850-6-5_00239(unknown) 3850-6-5_01350(un +known) 3850-6-5_02142(unknown) 3850-6-6_01062(unknown) 3850-6- +6_01065(unknown) 3850-6-6_01207(unknown) 3850-6-7_00139(unknown +) 3850-6-7_00140(unknown) 3850-6-7_01133(unknown) 3850-6-7_01263(u +nknown) 3850-6-8_00173(unknown) 3850-6-8_00338(unknown) 3850-6 +-8_01219(unknown) 3850-6-9_01211(unknown) 3850-6-9_01343(unknow +n) 3850-7-10_00113(unknown) 3850-7-10_01327(unknown) 3850-7-10 +_01328(unknown) 3850-7-11_01218(unknown) 3850-7-11_01330(unknow +n) 3850-7-12_01272(unknown) 3850-7-12_01398(unknown) 3850- +7-1_00111(unknown) 3850-7-1_00112(unknown) 3850-7-1_01287(unknown) + 3850-7-1_02022(unknown) 3850-7-2_01123(unknown) 3850-7-2_01233 +(unknown) 3850-7-3_00200(unknown) 3850-7-3_01371(unknown) 3850 +-7-3_02157(unknown) 3850-7-4_00004(unknown) 3850-7-4_01158(unkn +own) 3850-7-4_01290(unknown) 3850-7-5_00155(unknown) 3850-7-5_ +01363(unknown) 3850-7-6_00054(unknown) 3850-7-6_01170(unknown) + 3850-7-7_01195(unknown) 3850-7-7_01196(unknown) 3850-7-7_01346 +(unknown) 3850-7-8_00055(unknown) 3850-7-8_00056(unknown) 3850 +-7-8_01214(unknown) 3850-7-8_01363(unknown) 3850-7-9_01185(unkn +own) 3850-7-9_01327(unknown) 3850-8-10_00010(unknown) 3850-8-1 +0_01203(unknown) 3850-8-11_01391(unknown) 3850-8-11_01392(unkno +wn) 3850-8-12_01230(unknown) 3850-8-12_01233(unknown) 3850-8-1 +2_01354(unknown) 3850-8-12_01355(unknown) 3850-8-1_00024(unknow +n) 3850-8-1_01183(unknown) 3850-8-1_01300(unknown) 3850-8-2_01 +149(unknown) 3850-8-2_01281(unknown) 3850-8-3_00102(unknown) 3 +850-8-3_01220(unknown) 3850-8-3_01221(unknown) 3850-8-4_00210(u +nknown) 3850-8-4_01322(unknown) 3850-8-4_01408(unknown) 3850-8 +-5_01362(unknown) 3850-8-6_01279(unknown) 3850-8-6_01280(unknow +n) 3850-8-6_01416(unknown) 3850-8-7_00036(unknown) 3850-8-7_01 +349(unknown) 3850-8-8_00056(unknown) 3850-8-8_01279(unknown) 3 +850-8-8_01423(unknown) 3850-8-9_00002(unknown) 3850-8-9_01374(u +nknown) 3850-8-9_01500(unknown) Cluster6 SP_0917(pilin gene inverting-related protein) spr_041 +2(Degenerate transposase) spr_0817(Degenerate transposase) spr_0818 +(Degenerate transposase) spr_1886(Degenerate transposase) SPD_1 +901(transposase, putative) SP70585_0526(transposase) SP70585_09 +53(pilin gene inverting-related protein) SP70585_2181(transposase) + SPJ_0440(transposase) SPJ_0856(pilin gene inverting-related prot +ein) SPJ_2096(transposase) SPP_0487(transposase) SPP_0923(cons +erved domain protein) SPP_0925(transposase) SPP_2129(transposase) + SPT_1283(transposase) SPT_1285(transposase) SPT_2085(transposas +e) SPH_2262(transposase) SPG_0419(transposase) SPG_0842(tr +ansposase) SPG_2013(transposase) SPCG_0451(degenerate transposa +se) SPCG_0894(pilin gene inverting-related protein,truncated) SPCG_ +2041(degenerate transposase) HMPREF0837_10072(transposase) HMPR +EF0837_10749(possible transposase) HMPREF0837_11569(transposase) HM +PREF0837_11570(possible pilin gene inverting-related protein) ps +eudoSPN23F_04290(putative transposase (pseudogene)) pseudoSPN23F_084 +00(transposase (pseudogene)) pseudoSPN23F_20990(degenerate transposa +se) 3850-1-10_00199(unknown) 3850-1-10_01100(unknown) 3850 +-1-11_00716(unknown) 3850-1-11_00717(unknown) 3850-1-12_00315(u +nknown) 3850-1-12_00915(unknown) 3850-1-12_01260(unknown) 3850 +-1-1_00296(unknown) 3850-1-1_01194(unknown) 3850-1-2_00163(unkn +own) 3850-1-2_01140(unknown) 3850-1-3_00703(unknown) 3850-1-3_ +01129(unknown) 3850-1-4_00161(unknown) 3850-1-4_00258(unknown) + 3850-1-5_00111(unknown) 3850-1-5_00885(unknown) 3850-1-5_02093 +(unknown) 3850-1-6_00726(unknown) 3850-1-6_00727(unknown) 3850 +-1-6_01168(unknown) 3850-1-6_02316(unknown) 3850-1-7_00235(unkn +own) 3850-1-7_01142(unknown) 3850-1-8_00113(unknown) 3850-1-8_ +00120(unknown) 3850-1-8_00156(unknown) 3850-1-9_00282(unknown) + 3850-1-9_00812(unknown) 3850-1-9_01869(unknown) 3850-1-9_02527(unk +nown) 3850-2-10_00018(unknown) 3850-2-10_00755(unknown) 3850-2 +-10_01212(unknown) 3850-2-10_02313(unknown) 3850-2-11_00268(unk +nown) 3850-2-11_01155(unknown) 3850-2-12_00316(unknown) 3850-2 +-12_00854(unknown) 3850-2-12_01182(unknown) 3850-2-12_02494(unknown +) 3850-2-1_00101(unknown) 3850-2-1_00201(unknown) 3850-2-2 +_00082(unknown) 3850-2-2_01209(unknown) 3850-2-2_02340(unknown) + 3850-2-3_00758(unknown) 3850-2-3_01157(unknown) 3850-2-3_02338(un +known) 3850-2-4_00959(unknown) 3850-2-4_01399(unknown) 3850-2- +4_01400(unknown) 3850-2-4_02354(unknown) 3850-2-5_01279(unknown +) 3850-2-5_02529(unknown) 3850-2-6_00903(unknown) 3850-2-6_012 +86(unknown) 3850-2-6_01798(unknown) 3850-2-6_02397(unknown) 38 +50-2-7_00055(unknown) 3850-2-8_00762(unknown) 3850-2-8_00885(un +known) 3850-2-8_02241(unknown) 3850-2-9_00268(unknown) 3850-2- +9_01187(unknown) 3850-3-10_00947(unknown) 3850-3-10_01304(unkno +wn) 3850-3-10_02319(unknown) 3850-3-11_00684(unknown) 3850-3-1 +1_01099(unknown) 3850-3-11_02124(unknown) 3850-3-12_01294(unkno +wn) 3850-3-1_00060(unknown) 3850-3-1_02574(unknown) 3850-3 +-2_00353(unknown) 3850-3-2_00972(unknown) 3850-3-2_01372(unknown) + 3850-3-3_00614(unknown) 3850-3-3_00615(unknown) 3850-3-3_00616( +unknown) 3850-3-4_00248(unknown) 3850-3-4_02268(unknown) 3 +850-3-5_00976(unknown) 3850-3-5_01135(unknown) 3850-3-5_02101(unkno +wn) 3850-3-6_00821(unknown) 3850-3-6_00822(unknown) 3850-3-6_0 +1258(unknown) 3850-3-7_00620(unknown) 3850-3-7_00621(unknown) +3850-3-7_01092(unknown) 3850-3-8_00694(unknown) 3850-3-8_00695( +unknown) 3850-3-8_01122(unknown) 3850-3-9_00852(unknown) 3850- +3-9_01197(unknown) 3850-3-9_02318(unknown) 3850-5-10_00009(unkn +own) 3850-5-10_00890(unknown) 3850-5-10_00892(unknown) 3850-5-10_0 +1194(unknown) 3850-5-11_00563(unknown) 3850-5-12_00355(unkn +own) 3850-5-12_01133(unknown) 3850-5-1_03001(unknown) 3850-5-1 +_03431(unknown) 3850-5-1_03432(unknown) 3850-5-2_00706(unknown) + 3850-5-2_00707(unknown) 3850-5-2_01083(unknown) 3850-5-3_0081 +0(unknown) 3850-5-3_01191(unknown) 3850-5-3_01192(unknown) 3850-5- +3_01193(unknown) 3850-5-3_02309(unknown) 3850-5-4_00097(unknown +) 3850-5-4_01092(unknown) 3850-5-4_01093(unknown) 3850-5-5_007 +51(unknown) 3850-5-5_01159(unknown) 3850-5-5_01160(unknown) 3850-5 +-5_02160(unknown) 3850-5-6_00601(unknown) 3850-5-6_00602(unknow +n) 3850-5-6_01024(unknown) 3850-5-7_00858(unknown) 3850-5-7_02 +438(unknown) 3850-5-8_00810(unknown) 3850-5-8_01218(unknown) + 3850-5-9_00828(unknown) 3850-5-9_01197(unknown) 3850-5-9_02293(u +nknown) 3850-6-10_00744(unknown) 3850-6-10_01222(unknown) 3850 +-6-10_02379(unknown) 3850-6-11_00022(unknown) 3850-6-11_02122(u +nknown) 3850-6-12_00052(unknown) 3850-6-1_00603(unknown) 3 +850-6-1_01005(unknown) 3850-6-2_01202(unknown) 3850-6-2_02233(u +nknown) 3850-6-3_00760(unknown) 3850-6-3_01124(unknown) 38 +50-6-4_00988(unknown) 3850-6-4_00989(unknown) 3850-6-5_00035(un +known) 3850-6-5_00097(unknown) 3850-6-5_01124(unknown) 3850-6- +6_00536(unknown) 3850-6-6_00973(unknown) 3850-6-6_01974(unknown) + 3850-6-7_00045(unknown) 3850-6-7_00102(unknown) 3850-6-9 +_01120(unknown) 3850-7-10_02104(unknown) 3850-7-11_00031(un +known) 3850-7-11_00210(unknown) 3850-7-11_00699(unknown) 3850-7-11 +_02226(unknown) 3850-7-12_00104(unknown) 3850-7-12_01180(unknow +n) 3850-7-12_02224(unknown) 3850-7-1_00080(unknown) 3850-7-1_0 +0124(unknown) 3850-7-1_01069(unknown) 3850-7-2_02078(unknown) + 3850-7-3_00010(unknown) 3850-7-3_00094(unknown) 3850-7-3_00223( +unknown) 3850-7-3_01165(unknown) 3850-7-4_00155(unknown) 3850- +7-4_01071(unknown) 3850-7-4_01072(unknown) 3850-7-5_00178(unkno +wn) 3850-7-6_00580(unknown) 3850-7-6_00936(unknown) 3850-7-6_0 +2005(unknown) 3850-7-7_00700(unknown) 3850-7-7_01107(unknown) +3850-7-7_01108(unknown) 3850-7-7_02203(unknown) 3850-7-8_00115( +unknown) 3850-7-8_00711(unknown) 3850-7-8_00712(unknown) 3850- +7-9_00026(unknown) 3850-7-9_00055(unknown) 3850-7-9_01092(unknown) + 3850-7-9_02243(unknown) 3850-8-10_00092(unknown) 3850-8-10_007 +49(unknown) 3850-8-10_01097(unknown) 3850-8-11_00782(unknown) +3850-8-11_02607(unknown) 3850-8-12_00179(unknown) 3850-8-12_011 +38(unknown) 3850-8-12_02196(unknown) 3850-8-2_00044(unknown +) 3850-8-2_00610(unknown) 3850-8-2_02175(unknown) 3850-8-3_003 +28(unknown) 3850-8-3_01129(unknown) 3850-8-4_00192(unknown) 38 +50-8-4_00817(unknown) 3850-8-4_01236(unknown) 3850-8-4_02333(unknow +n) 3850-8-5_00022(unknown) 3850-8-5_00233(unknown) 3850-8-5_01 +143(unknown) 3850-8-6_00707(unknown) 3850-8-6_01185(unknown) + 3850-8-7_00046(unknown) 3850-8-7_00190(unknown) 3850-8-7_01111(u +nknown) 3850-8-7_01112(unknown) 3850-8-8_01213(unknown) 38 +50-8-9_00861(unknown) 3850-8-9_01283(unknown) 3850-8-9_02357(unknow +n) Cluster7 spr_1379(ABC transporter, truncation) spr_1380(ABC tr +ansporter, truncation) spr_1381(ABC transporter, truncation) SP +D_1355(conserved hypothetical protein) SPP_1546(ABC tran +sporter) SPG_1451(ABC transporter, ATP-binding protein) + SPG_1452(hypothetical protein) SPG_1453(hypothetical protein) +SPCG_1511(hypothetical protein) SPCG_1512(hypothetical protein) SPC +G_1513(ABC-type multidrug transport system, ATPase and permease compo +nents) HMPREF0837_11760(ABC superfamily ATP binding cassette tra +nsporter, ABC protein) SPN23F_14900(ABC transporter ATP-binding +protein) 3850-1-10_01718(unknown) 3850-1-10_01719(unknown) 385 +0-1-10_01720(unknown) 3850-1-11_01656(unknown) 3850-1-11_01657( +unknown) 3850-1-12_01833(unknown) 3850-1-12_01834(unknown) 385 +0-1-12_01835(unknown) 3850-1-1_01810(unknown) 3850-1-1_01811(un +known) 3850-1-1_01812(unknown) 3850-1-2_01768(unknown) 3850-1- +2_01769(unknown) 3850-1-3_01715(unknown) 3850-1-3_01717(unknown +) 3850-1-4_01809(unknown) 3850-1-4_01810(unknown) 3850-1-4_018 +11(unknown) 3850-1-5_00343(unknown) 3850-1-5_00344(unknown) 38 +50-1-5_00345(unknown) 3850-1-6_01749(unknown) 3850-1-6_01750(un +known) 3850-1-7_01682(unknown) 3850-1-7_01683(unknown) 3850-1- +7_01684(unknown) 3850-1-8_01560(unknown) 3850-1-8_01561(unknown +) 3850-1-9_01882(unknown) 3850-2-10_01781(unknown) 3850-2- +10_01782(unknown) 3850-2-10_01783(unknown) 3850-2-11_01600(unkn +own) 3850-2-12_01789(unknown) 3850-2-12_01790(unknown) 385 +0-2-1_01699(unknown) 3850-2-1_01700(unknown) 3850-2-2_01809(unk +nown) 3850-2-2_01810(unknown) 3850-2-3_01740(unknown) 3850-2-3 +_01741(unknown) 3850-2-3_01742(unknown) 3850-2-4_01896(unknown) + 3850-2-4_01897(unknown) 3850-2-5_01849(unknown) 3850-2-5_0185 +2(unknown) 3850-2-6_01813(unknown) 3850-2-6_01814(unknown) + 3850-2-7_01857(unknown) 3850-2-7_01858(unknown) 3850-2-7_01859(unk +nown) 3850-2-8_01652(unknown) 3850-2-8_01653(unknown) 3850 +-2-9_01772(unknown) 3850-2-9_01773(unknown) 3850-2-9_01774(unknown) + 3850-3-10_00128(unknown) 3850-3-10_00129(unknown) 3850-3- +11_00149(unknown) 3850-3-11_00150(unknown) 3850-3-12_01881(unkn +own) 3850-3-12_01882(unknown) 3850-3-1_01861(unknown) 3850-3-1 +_01862(unknown) 3850-3-2_00363(unknown) 3850-3-2_00364(unknown) + 3850-3-3_01637(unknown) 3850-3-3_01638(unknown) 3850-3-4_ +01712(unknown) 3850-3-4_01713(unknown) 3850-3-5_01574(unknown) + 3850-3-5_01575(unknown) 3850-3-5_01576(unknown) 3850-3-6_01809 +(unknown) 3850-3-7_01655(unknown) 3850-3-8_01718(unknown) +3850-3-8_01719(unknown) 3850-3-9_01773(unknown) 3850-3-9_01774( +unknown) 3850-3-9_01775(unknown) 3850-5-10_01753(unknown) 3850 +-5-10_01754(unknown) 3850-5-11_01542(unknown) 3850-5-11_01543(u +nknown) 3850-5-11_01544(unknown) 3850-5-12_01604(unknown) 3850 +-5-12_01605(unknown) 3850-5-1_03988(unknown) 3850-5-2_01670 +(unknown) 3850-5-2_01671(unknown) 3850-5-2_01672(unknown) 3850 +-5-3_01717(unknown) 3850-5-4_01717(unknown) 3850-5-4_01718(unkn +own) 3850-5-5_01658(unknown) 3850-5-5_01659(unknown) 3850- +5-6_01548(unknown) 3850-5-6_01549(unknown) 3850-5-8_01725(u +nknown) 3850-5-8_01726(unknown) 3850-5-9_01748(unknown) 38 +50-6-10_01742(unknown) 3850-6-11_01577(unknown) 3850-6-12_0 +1672(unknown) 3850-6-12_01673(unknown) 3850-6-12_01674(unknown) + 3850-6-1_01558(unknown) 3850-6-2_01734(unknown) 3850-6-2_0173 +5(unknown) 3850-6-2_01737(unknown) 3850-6-3_01640(unknown) 385 +0-6-3_01641(unknown) 3850-6-4_01491(unknown) 3850-6-4_01492(unk +nown) 3850-6-5_01713(unknown) 3850-6-5_01714(unknown) 3850 +-6-6_01495(unknown) 3850-6-7_01635(unknown) 3850-6-7_01636(unkn +own) 3850-6-8_01737(unknown) 3850-6-8_01738(unknown) 3850- +6-9_00152(unknown) 3850-6-9_00153(unknown) 3850-6-9_00154(unknown) + 3850-7-10_01589(unknown) 3850-7-10_01590(unknown) 3850-7-10_01 +591(unknown) 3850-7-11_01664(unknown) 3850-7-11_01665(unknown) + 3850-7-12_01702(unknown) 3850-7-1_01633(unknown) 3850-7-1_ +01634(unknown) 3850-7-1_01635(unknown) 3850-7-2_01514(unknown) + 3850-7-2_01515(unknown) 3850-7-2_01516(unknown) 3850-7-3_01717 +(unknown) 3850-7-3_01718(unknown) 3850-7-3_01719(unknown) 3850 +-7-4_01596(unknown) 3850-7-4_01597(unknown) 3850-7-4_01598(unknown) + 3850-7-5_01722(unknown) 3850-7-5_01723(unknown) 3850-7-5_0172 +4(unknown) 3850-7-6_01438(unknown) 3850-7-6_01439(unknown) + 3850-7-7_01649(unknown) 3850-7-7_01650(unknown) 3850-7-7_01651(unk +nown) 3850-7-8_01671(unknown) 3850-7-8_01672(unknown) 3850-7-8 +_01673(unknown) 3850-7-9_01711(unknown) 3850-7-9_01712(unknown) + 3850-8-10_01625(unknown) 3850-8-10_01626(unknown) 3850-8- +11_01860(unknown) 3850-8-11_01861(unknown) 3850-8-11_01862(unknown) + 3850-8-12_01664(unknown) 3850-8-12_01665(unknown) 3850-8- +1_00104(unknown) 3850-8-1_00105(unknown) 3850-8-1_00106(unknown) + 3850-8-2_01624(unknown) 3850-8-2_01625(unknown) 3850-8-2_01626(u +nknown) 3850-8-3_01697(unknown) 3850-8-3_01698(unknown) 3850-8 +-3_01699(unknown) 3850-8-4_01740(unknown) 3850-8-4_01741(unknow +n) 3850-8-4_01742(unknown) 3850-8-5_00059(unknown) 3850-8- +6_01734(unknown) 3850-8-6_01735(unknown) 3850-8-7_01748(unknown +) 3850-8-7_01749(unknown) 3850-8-7_01750(unknown) 3850-8-8_001 +71(unknown) 3850-8-9_00075(unknown) 3850-8-9_00076(unknown) 38 +50-8-9_00077(unknown) Cluster8 spr_0324(Transposase, uncharacterized, truncation) sp +r_1295(Transposase, uncharacterized, truncation) spr_1296(Hypothetic +al protein) spr_2016(Transposase, uncharacterized, truncation) +SPD_1269(conserved hypothetical protein) SPD_1270(conserved hypothet +ical protein) SPD_2038(conserved hypothetical protein) SP70585_ +2338(transposase) SPJ_1227(transposase) SPJ_1339(transposase) +SPJ_1340(transposase) SPJ_2237(transposase) SPP_0403(transposas +e) SPP_1461(transposase) SPP_1462(transposase) SPP_2264(transposas +e) SPT_2229(transposase) SPG_0329(IS66-Spn1, transposas +e) SPG_1204(IS66-Spn1, transposase) SPG_2157(IS66-Spn1, transposase +) SPCG_1428(hypothetical protein) SPCG_1429(hypothetical protei +n) SPCG_2178(transposase) HMPREF0837_10225(transposase family p +rotein) pseudoSPN23F_22440(putative transposase family protein) + 3850-1-10_00567(unknown) 3850-1-10_01498(unknown) 3850-1-1 +1_01566(unknown) 3850-1-11_02293(unknown) 3850-1-12_00101(unkno +wn) 3850-1-12_00809(unknown) 3850-1-1_00699(unknown) 3850- +1-2_01520(unknown) 3850-1-2_01521(unknown) 3850-1-3_01631(unkno +wn) 3850-1-3_02363(unknown) 3850-1-4_00717(unknown) 3850-1-4_0 +2455(unknown) 3850-1-5_01625(unknown) 3850-1-5_01626(unknown) + 3850-1-7_00615(unknown) 3850-1-7_02400(unknown) 3850-1- +8_00068(unknown) 3850-1-8_02320(unknown) 3850-1-9_00313(unknown +) 3850-1-9_00713(unknown) 3850-2-10_00663(unknown) 3850-2-10_0 +2471(unknown) 3850-2-11_01488(unknown) 3850-2-11_01507(unknown) + 3850-2-11_01509(unknown) 3850-2-11_02462(unknown) 3850-2-12_0 +0125(unknown) 3850-2-12_00329(unknown) 3850-2-1_00860(unknown) + 3850-2-1_00861(unknown) 3850-2-1_02489(unknown) 3850-2-2_00751 +(unknown) 3850-2-2_01574(unknown) 3850-2-3_00655(unknown) 3850 +-2-3_01652(unknown) 3850-2-4_00174(unknown) 3850-2-4_00175(unkn +own) 3850-2-4_00482(unknown) 3850-2-5_01652(unknown) 3850-2-5_ +02710(unknown) 3850-2-6_00139(unknown) 3850-2-6_01616(unknown) + 3850-2-6_02551(unknown) 3850-2-7_00776(unknown) 3850-2-7_02431 +(unknown) 3850-2-8_00297(unknown) 3850-2-8_00658(unknown) 3850 +-2-8_01464(unknown) 3850-2-8_02380(unknown) 3850-2-9_01488(unkn +own) 3850-2-9_01599(unknown) 3850-3-10_00837(unknown) 3850-3-1 +0_02477(unknown) 3850-3-12_01798(unknown) 3850-3-1_0022 +9(unknown) 3850-3-1_00235(unknown) 3850-3-1_00236(unknown) 3850-3- +1_00864(unknown) 3850-3-1_02710(unknown) 3850-3-2_00866(unknown +) 3850-3-2_01076(unknown) 3850-3-2_01077(unknown) 3850-3-2_01787(u +nknown) 3850-3-2_02432(unknown) 3850-3-3_02341(unknown) 38 +50-3-4_00999(unknown) 3850-3-4_01000(unknown) 3850-3-4_02419(unknow +n) 3850-3-5_01489(unknown) 3850-3-5_01490(unknown) 3850-3-5_02 +253(unknown) 3850-3-6_01150(unknown) 3850-3-7_00528(unknown +) 3850-3-7_01567(unknown) 3850-3-7_02408(unknown) 3850-3-8_000 +25(unknown) 3850-3-9_00291(unknown) 3850-3-9_00748(unknown) + 3850-5-10_00764(unknown) 3850-5-10_01066(unknown) 3850-5-11_0 +1356(unknown) 3850-5-11_01357(unknown) 3850-5-11_02224(unknown) + 3850-5-12_00237(unknown) 3850-5-12_01546(unknown) 3850-5-12_02363 +(unknown) 3850-5-1_03905(unknown) 3850-5-2_00600(unknown) +3850-5-2_02391(unknown) 3850-5-3_01638(unknown) 3850-5-3_02411( +unknown) 3850-5-4_01634(unknown) 3850-5-4_02388(unknown) 3 +850-5-5_01573(unknown) 3850-5-6_00495(unknown) 3850-5-6_02263(u +nknown) 3850-5-8_00350(unknown) 3850-5-8_01500(unknown) 38 +50-5-8_01643(unknown) 3850-5-9_00737(unknown) 3850-6-11 +_00086(unknown) 3850-6-12_00616(unknown) 3850-6-12_02349(unknow +n) 3850-6-1_02274(unknown) 3850-6-2_00639(unknown) 3850-6- +2_02382(unknown) 3850-6-3_00664(unknown) 3850-6-3_00887(unknown +) 3850-6-3_01549(unknown) 3850-6-4_00464(unknown) 3850-6-4_022 +12(unknown) 3850-6-5_00592(unknown) 3850-6-6_00863(unknown) + 3850-6-6_02131(unknown) 3850-6-7_01559(unknown) 3850-6-7_0235 +4(unknown) 3850-6-8_01503(unknown) 3850-6-9_01281(unknown) + 3850-6-9_01479(unknown) 3850-7-10_00569(unknown) 3850-7-10_022 +45(unknown) 3850-7-10_02246(unknown) 3850-7-11_01581(unknown) +3850-7-11_01582(unknown) 3850-7-11_02329(unknown) 3850-7-11_02330(u +nknown) 3850-7-12_00636(unknown) 3850-7-12_02334(unknown) +3850-7-1_00547(unknown) 3850-7-1_02291(unknown) 3850-7-2_00037( +unknown) 3850-7-2_02238(unknown) 3850-7-3_00661(unknown) 3850- +7-3_01637(unknown) 3850-7-3_02443(unknown) 3850-7-4_00563(unkno +wn) 3850-7-5_00581(unknown) 3850-7-6_00490(unknown) 3850-7 +-6_02155(unknown) 3850-7-7_00601(unknown) 3850-7-7_02324(unknow +n) 3850-7-8_00606(unknown) 3850-7-8_02333(unknown) 3850-7- +9_01627(unknown) 3850-7-9_01629(unknown) 3850-8-10_00191(unknow +n) 3850-8-10_02344(unknown) 3850-8-11_00680(unknown) 3850- +8-12_00582(unknown) 3850-8-12_02353(unknown) 3850-8-1_00031(unk +nown) 3850-8-1_01512(unknown) 3850-8-2_01534(unknown) 3850-8-2 +_01535(unknown) 3850-8-2_02279(unknown) 3850-8-3_00697(unknown) + 3850-8-3_02340(unknown) 3850-8-4_01480(unknown) 3850-8-4_0249 +5(unknown) 3850-8-5_02322(unknown) 3850-8-6_00190(unknown) + 3850-8-6_00191(unknown) 3850-8-6_00223(unknown) 3850-8-6_01655(unk +nown) 3850-8-7_00622(unknown) 3850-8-8_02285(unknown) +3850-8-9_00763(unknown) 3850-8-9_01767(unknown) 3850-8-9_01769(unkn +own) 3850-8-9_02517(unknown) Cluster9 SP_0733(hypothetical protein) SP_0810(hypothetical protei +n) SP_1302(conserved hypothetical protein) SP_1487(hypothetical pro +tein) spr_0645(Hypothetical protein) spr_0717(Transposase) spr +_1180(Degenerate transposase) spr_1342(Degenerate transposase) +SPD_0639(conserved hypothetical protein) SPD_0711(conserved hypothet +ical protein) SPD_1157(conserved hypothetical protein) SPD_1316(con +served hypothetical protein) SP70585_0780(transposase) SP70585_ +1368(transposase) SP70585_1525(transposase) SPJ_0673(transposas +e) SPJ_1218(transposase) SPJ_1383(transposase) SPJ_1384(transposas +e) SPP_0745(transposase) SPP_0819(transposase) SPP_1343(transp +osase) SPP_1505(transposase) SPT_0749(transposase) SPT_0792(tr +ansposase family protein) SPT_0924(hypothetical protein) SPH_08 +20(transposase) SPH_0910(transposase) SPH_1445(transposase) SPH_14 +50(transposase) SPG_0666(IS630-SpnII, transposase) SPG_1196(IS6 +30-SpnII, transposase) SPG_1411(IS630-SpnII, transposase) SPCG_ +0682(hypothetical protein) SPCG_1269(hypothetical protein) HMPR +EF0837_11017(transposase) HMPREF0837_11060(possible transposase) HM +PREF0837_11684(transposase) pseudoSPN23F_06580(putative transpos +ase (pseudogene)) pseudoSPN23F_11950(putative transposase (pseudogen +e)) pseudoSPN23F_14470(putative transposase (pseudogene)) 3850- +1-10_00217(unknown) 3850-1-10_00914(unknown) 3850-1-10_01673(unknow +n) 3850-1-11_00022(unknown) 3850-1-12_00241(unknown) 3850- +1-12_00324(unknown) 3850-1-12_01117(unknown) 3850-1-12_01118(unknow +n) 3850-1-1_01013(unknown) 3850-1-1_01769(unknown) 3850-1- +2_00959(unknown) 3850-1-3_01528(unknown) 3850-1-3_01675(unknown +) 3850-1-4_01063(unknown) 3850-1-5_00971(unknown) 3850-1-5 +_01532(unknown) 3850-1-5_01671(unknown) 3850-1-6_00976(unknown) + 3850-1-9_01078(unknown) 3850-2-10_01026(unknown) +3850-2-10_01604(unknown) 3850-2-11_00165(unknown) 3850-2-11_009 +81(unknown) 3850-2-11_00982(unknown) 3850-2-12_00039(unknown) +3850-2-12_01064(unknown) 3850-2-12_01744(unknown) 3850-2-1_0002 +0(unknown) 3850-2-1_01658(unknown) 3850-2-1_01659(unknown) 385 +0-2-2_01007(unknown) 3850-2-2_01566(unknown) 3850-2-3_01023(unk +nown) 3850-2-3_01697(unknown) 3850-2-4_00604(unknown) 3850-2-4 +_01204(unknown) 3850-2-4_01658(unknown) 3850-2-5_01649(unknown) + 3850-2-6_01768(unknown) 3850-2-6_01769(unknown) 3850-2-7_ +01107(unknown) 3850-2-8_00145(unknown) 3850-2-8_00952(unknown) + 3850-2-9_00993(unknown) 3850-2-9_01595(unknown) 3850-2-9_01731 +(unknown) 3850-3-11_00893(unknown) 3850-3-11_00907(unknown) + 3850-3-11_01622(unknown) 3850-3-1_00524(unknown) 3850-3-1 +_01818(unknown) 3850-3-2_00029(unknown) 3850-3-2_01175(unknown) + 3850-3-2_01176(unknown) 3850-3-2_01784(unknown) 3850-3-3_0087 +5(unknown) 3850-3-4_00178(unknown) 3850-3-4_01046(unknown) + 3850-3-5_00081(unknown) 3850-3-5_01535(unknown) 3850-3-6_01071 +(unknown) 3850-3-9_01066(unknown) 3850-3-9_01728(unknow +n) 3850-3-9_01729(unknown) 3850-5-10_01070(unknown) 3850-5-10_ +01543(unknown) 3850-5-10_01710(unknown) 3850-5-11_01352(unknown +) 3850-5-12_00113(unknown) 3850-5-12_01454(unknown) 3850-5 +-1_03245(unknown) 3850-5-2_00016(unknown) 3850-5-2_00128(unknow +n) 3850-5-2_00932(unknown) 3850-5-2_01443(unknown) 3850-5- +4_00906(unknown) 3850-5-4_01534(unknown) 3850-5-5_01011(unknown +) 3850-5-6_00870(unknown) 3850-5-7_00592(unknown) 3850 +-5-8_01027(unknown) 3850-5-8_01028(unknown) 3850-5-8_01497(unknown) + 3850-5-9_01068(unknown) 3850-6-11_00147(unknown) 3850 +-6-11_00896(unknown) 3850-6-11_01401(unknown) 3850-6-12_00946(u +nknown) 3850-6-2_01007(unknown) 3850-6-2_01576(unknown) + 3850-6-3_00992(unknown) 3850-6-3_01466(unknown) 3850-6-4_0079 +6(unknown) 3850-6-5_00942(unknown) 3850-6-6_00049(unknown) + 3850-6-7_00852(unknown) 3850-6-8_00975(unknown) 3850-6-8_0 +0976(unknown) 3850-6-9_01628(unknown) 3850-7-10_00907(unkno +wn) 3850-7-10_00908(unknown) 3850-7-11_00005(unknown) 3850-7-1 +1_00959(unknown) 3850-7-11_01458(unknown) 3850-7-11_01628(unknown) + 3850-7-12_00993(unknown) 3850-7-12_01660(unknown) 3850-7-1 +_00100(unknown) 3850-7-1_00890(unknown) 3850-7-1_01425(unknown) + 3850-7-2_00830(unknown) 3850-7-2_00831(unknown) 3850-7-3_0100 +4(unknown) 3850-7-3_01513(unknown) 3850-7-4_00896(unknown) 385 +0-7-4_01554(unknown) 3850-7-5_00930(unknown) 3850-7-6_00783 +(unknown) 3850-7-7_01619(unknown) 3850-7-8_00061(unknown) +3850-7-8_00979(unknown) 3850-7-9_00908(unknown) 3850-8-10_0 +0258(unknown) 3850-8-10_01481(unknown) 3850-8-11_01816(unknown) + 3850-8-11_01817(unknown) 3850-8-12_00939(unknown) 3850-8-12_0 +0940(unknown) 3850-8-12_01486(unknown) 3850-8-12_01623(unknown) + 3850-8-1_00954(unknown) 3850-8-1_00955(unknown) 3850-8-2_0086 +1(unknown) 3850-8-2_01414(unknown) 3850-8-2_01580(unknown) 385 +0-8-3_00013(unknown) 3850-8-3_01022(unknown) 3850-8-3_01484(unknown +) 3850-8-3_01660(unknown) 3850-8-4_00247(unknown) 3850 +-8-6_00129(unknown) 3850-8-6_00993(unknown) 3850-8-6_00994(unknown) + 3850-8-7_00941(unknown) 3850-8-8_00270(unknown) 3850-8-8_ +01498(unknown) 3850-8-9_01084(unknown) Cluster10 SP_0042(competence factor transporting ATP-binding/permea +se protein ComA) spr_0043(Transport ATP-binding protein ComA) s +pr_0468(Conserved hypothetical protein, truncation) SPD_0049(com +petence factor transporting ATP-binding/permease protein ComA) S +P70585_0109(transport/processing ATP-binding protein ComA) SPJ_0 +073(transport/processing ATP-binding protein ComA) SPP_0107(tran +sport/processing ATP-binding protein ComA) SPP_0552(transport/proces +sing ATP-binding protein ComA) SPT_0080(transport/processing ATP +-binding protein ComA) SPH_0148(transport/processing ATP-binding + protein ComA) SPH_0636(transport/processing ATP-binding protein Com +A) SPG_0048(transport/processing ATP-binding protein ComA) SPG_ +0480(BlpC ABC transporter, ATP-binding protein (blpA)) SPCG_0044 +(competence factor transporting ATP-binding/permease protein ComA) S +PCG_0501(competence factor transporting ATP-binding/permease protein +ComA) HMPREF0837_10331(bacteriocin-associated ABC superfamily AT +P binding cassette transporter) SPN23F_00590(bacteriocin transpo +rt/processing ATP-binding protein) SPN23F_04810(putative bacteriocin + transport/processing ATP-binding protein BlpA) 3850-1-10_00272( +unknown) 3850-1-10_00721(unknown) 3850-1-11_00336(unknown) 385 +0-1-11_00769(unknown) 3850-1-12_00494(unknown) 3850-1-12_00978( +unknown) 3850-1-1_00388(unknown) 3850-1-1_00409(unknown) 3 +850-1-2_00344(unknown) 3850-1-2_00769(unknown) 3850-1-3_00314(u +nknown) 3850-1-3_00762(unknown) 3850-1-4_00415(unknown) 3850-1 +-4_00869(unknown) 3850-1-5_00272(unknown) 3850-1-5_00578(unknow +n) 3850-1-6_00350(unknown) 3850-1-6_00773(unknown) 3850-1- +7_00312(unknown) 3850-1-7_00764(unknown) 3850-1-8_00256(unknown +) 3850-1-8_00700(unknown) 3850-1-9_00401(unknown) 3850-1-9_008 +76(unknown) 3850-2-10_00345(unknown) 3850-2-10_00827(unknown) + 3850-2-11_00355(unknown) 3850-2-11_00792(unknown) 3850-2-12 +_00440(unknown) 3850-2-1_00273(unknown) 3850-2-1_00737(unknown) + 3850-2-2_00436(unknown) 3850-2-2_00921(unknown) 3850-2-3_ +00363(unknown) 3850-2-3_00819(unknown) 3850-2-4_00558(unknown) + 3850-2-4_01023(unknown) 3850-2-5_00449(unknown) 3850-2-5_00928 +(unknown) 3850-2-6_00524(unknown) 3850-2-6_00965(unknown) +3850-2-7_00534(unknown) 3850-2-7_00935(unknown) 3850-2-8_00372( +unknown) 3850-2-8_00831(unknown) 3850-2-9_00327(unknown) 3850- +2-9_00787(unknown) 3850-3-10_00581(unknown) 3850-3-11_00322 +(unknown) 3850-3-11_00746(unknown) 3850-3-12_00533(unknown) 38 +50-3-12_00998(unknown) 3850-3-1_00557(unknown) 3850-3-1_00932(u +nknown) 3850-3-2_00129(unknown) 3850-3-2_00535(unknown) 38 +50-3-3_00244(unknown) 3850-3-3_00675(unknown) 3850-3-4_00485(un +known) 3850-3-4_00973(unknown) 3850-3-5_00393(unknown) 3850-3- +5_00574(unknown) 3850-3-6_00394(unknown) 3850-3-6_00884(unknown +) 3850-3-7_00247(unknown) 3850-3-7_00684(unknown) 3850-3-8 +_00136(unknown) 3850-3-8_00321(unknown) 3850-3-9_00489(unknown) + 3850-3-9_00918(unknown) 3850-5-10_00158(unknown) 3850-5-10_00 +956(unknown) 3850-5-11_00180(unknown) 3850-5-11_00633(unknown) + 3850-5-12_00418(unknown) 3850-5-12_00866(unknown) 3850-5-1 +_02619(unknown) 3850-5-1_03062(unknown) 3850-5-2_00294(unknown) + 3850-5-2_00766(unknown) 3850-5-3_00423(unknown) 3850-5-4_ +00262(unknown) 3850-5-4_00735(unknown) 3850-5-5_00414(unknown) + 3850-5-5_00812(unknown) 3850-5-6_00203(unknown) 3850-5-6_00661 +(unknown) 3850-5-7_00560(unknown) 3850-5-7_01369(unknown) +3850-5-8_00427(unknown) 3850-5-8_00876(unknown) 3850-5-9_00425( +unknown) 3850-5-9_01274(unknown) 3850-6-10_00404(unknown) 3850 +-6-10_00845(unknown) 3850-6-11_00699(unknown) 3850-6-12_003 +28(unknown) 3850-6-12_00756(unknown) 3850-6-1_00202(unknown) 3 +850-6-1_00663(unknown) 3850-6-2_00347(unknown) 3850-6-2_00820(u +nknown) 3850-6-3_00197(unknown) 3850-6-3_00202(unknown) 38 +50-6-4_00081(unknown) 3850-6-4_00618(unknown) 3850-6-5_00298(un +known) 3850-6-5_00740(unknown) 3850-6-6_00172(unknown) 3850-6- +6_00595(unknown) 3850-6-7_00228(unknown) 3850-6-7_00668(unknown +) 3850-6-8_00397(unknown) 3850-6-8_00834(unknown) 3850-6-9 +_00321(unknown) 3850-6-9_00760(unknown) 3850-7-10_00284(unknown +) 3850-7-10_00716(unknown) 3850-7-11_00317(unknown) 3850-7-11_ +00760(unknown) 3850-7-12_00313(unknown) 3850-7-1_00264(unkn +own) 3850-7-1_00693(unknown) 3850-7-2_00269(unknown) 3850-7-2_ +00688(unknown) 3850-7-3_00359(unknown) 3850-7-3_00812(unknown) + 3850-7-4_00274(unknown) 3850-7-4_00708(unknown) 3850-7-5_0 +0283(unknown) 3850-7-5_00734(unknown) 3850-7-6_00211(unknown) +3850-7-6_00646(unknown) 3850-7-7_00305(unknown) 3850-7-7_00760( +unknown) 3850-7-8_00446(unknown) 3850-7-8_00773(unknown) 3 +850-7-9_00251(unknown) 3850-7-9_00697(unknown) 3850-8-10_00351( +unknown) 3850-8-10_00807(unknown) 3850-8-11_00346(unknown) 385 +0-8-11_00847(unknown) 3850-8-12_00310(unknown) 3850-8-12_00732( +unknown) 3850-8-1_00391(unknown) 3850-8-1_01185(unknown) 3 +850-8-2_00332(unknown) 3850-8-2_00668(unknown) 3850-8-3_00407(u +nknown) 3850-8-3_00853(unknown) 3850-8-4_00435(unknown) 38 +50-8-5_00300(unknown) 3850-8-5_00742(unknown) 3850-8-6_00303(un +known) 3850-8-6_00768(unknown) 3850-8-7_00339(unknown) 3850-8- +7_00744(unknown) 3850-8-8_00400(unknown) 3850-8-8_00837(unknown +) 3850-8-9_00458(unknown) 3850-8-9_00923(unknown)

The actual data has about 8000 clusters ie 0-8000

Thanks

$new_guy