Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^4: Counting matches

by Nicpetbio23! (Acolyte)
on May 29, 2017 at 16:00 UTC ( [id://1191517]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Counting matches
in thread Counting matches

I think that is too specific. I want to count the occurrences of every element in Genomes_used_Hant.txt that occurs in NRT2.txt. Not just Gloin1
Genomes_used_hant.txt Laesu1 Patat1 Hydru2 Armost1 Pacta12 VKMF3808 Gaegr1 Corca1 Artol1 Agabivarbur1 Uncre1 Armme11 Suidec1 Aspka11 MagorBR32 Bjead1 Gymlu1 CopciAmutBmut1 Thihy1 Aspgl1 Leugo1 Bacci1 Schoc1 Gloin1 ....ext
NRT2.txt >ANRT2 MDFAKLLVASPEVNPNNRKALTIPVLNPFNTYGRVFFFSWFGFMLAFLSWYAFPPLLTVTIRDDLDMSQT +QIANSNIIALLATLLVRLICGPLCDRFGPRLVFIGLLLVGSIPTAMAGLVTSPQGLIALRFFIGILGGT +FVPCQVWCTGFFDKSIVGTANSLAAGLGNAGGGITYFVMPAIFDSLIRDQGLPAHKAWRVAYIVPFILI +VAAALGMLFTCDDTPTGKWSERHIWMKEDTQTASKGNIVDLSSGAQSSRPSGPPSIIAYAIPDVEKKGT +ETPLEPQSQAIGQFDAFRANAVASPSRKEAFNVIFSLATMAVAVPYACSFGSELAINSILGDYYDKNFP +YMGQTQTGKWAAMFGFLNIVCRPAGGFLADFLYRKTNTPWAKKLLLSFLGVVMGAFMIAMGFSDPKSEA +TMFGLTAGLAFFLESCNGAIFSLVPHVHPYANGIVSGMVGGFGNLGGIIFAIIFRYSHHDYARGIWILG +VISMAVFISVSWVRPVPKSQMRE >Metac1_3189 MGFNISLLWKTPMVDPINKKARSIPVLNVVDPYGRVFFFSWMGFMLGFWAWYTFPPLLTVTIKKDLHLSA +AEVANSNIVSLCATLLLRFVAGPLCDQFGSRRVYASLLLLGCLPVGLAPLVKTANGLYVSRFFIGILGA +TFVPCQVWCTGFFDKNIVGTANALSGGWGNAGGGITYFIMPAVFDSLVASQGMAPSKAWRVTFVVPLIC +LIACALGMLFLCPDTPLGSWEERSQKLQENLDQYSPTSTTAVNTPHILSEPPSRDVEKAEEFDEDSKFY +KQPSAISLSEAVAIAQAETVVKPSFKDSLPVMLSLQTLFHVATYSCSFGGELAVNSILSSYYKANFPHL +DQTKASNYAAIFGFLNFVTRPLGGVVADILYRMSGQNLWTKKAWITMAGLLSGALLIIVGKVDPSEANG +RDIGTMVGLVTVAAFFIEAGNGANFALVPHVYPAANGVLSGCTGGGGNLGGVVFAIIFRFIDHGSGYAT +ACWVIGVIHIAVNLAVCRIPPLPKGQVGGQ >MagorUS71_00075311 MGINVKFSDLYRAPEVNPITRKARSIPALNVINMYGRVFFFSWFGFMIAFWAWYTFPPLLTVTIRKDLNL +TAAEVANSNIVSLVATLFVRMVAGPLCDLWGSRVVFGGVLLVGAIPLGLAPLIQNATGLYVSRFFIGIL +GGAFVPCQVWSTGFFDKNVVGTANALTGGFGNAGGGITYFIMPAVFDSFVHRMGYTPGQAWRLTFVVPL +VMIIVTGVSLLLLCPDTPTGKWSERHMHAQQMVGQASTTDATNQDKIVDVPGSITDKGPNASNSSEGNS +FVEEKEKTRKEKDEQVGELLDAEAGRVIKSDDAAVQNTDTIAKPTFGESLRVMASPQTLVHVLTYFCSF +GGELAINAILSSYYLKNFPELGQTGASNYAAIFGFLNFITRPLGGVVSDLLYNAAGSGPRGLWLKKGWI +HVCGIATGALLILIGQLNPHHQPTMLGLVIFMAFFHEAGNGANFALIPHVHPHANGLVSGITGAGGNLG +GVVFAVVFRFVGGGTGYATGFWIIGIVHIAINIAMAWIKPLPKGQIGGY >Phchr2_2932727 MVYFPFARPQRSSVAPAETADALDTAAAAQIGHPEKLSLWERLTTVRINPANNKCTTLPILKLNNPYSIN +FHLSWLGFWVAFLSWFAFSPLVPEAVKNDLKLTQKQIGNSNIVSLCSTLLVRVIVGPLCDRFGPRKVMA +GLLIVGAIPSGLAGTVSSAQGLYVIRFFIGILGGTFVPCQAWTTAFYDTSIVGRANALVAGWGNSGGGF +TFIIMVALYDRLRSDGLSPHSAWRAAFAIVPVPILFFVAIITLLVGTDHPNGKWADRHKNAALVPAALT +DGSPRGSDDIEAIREIAPTQDGPKEKTENEKDAVNVDVTAVMPSRPPSIRQDSVPLTWKIALDVVLNPL +TWLPALAYMSSFGYELAIDANLANVYFGLYNKTKGFGQTRCGYIASIFGLLNVFSRPLGGYMGDVVYRR +WGVPGKKYLVLALGVLQGALSLAWGLYLDRHAASLAVVIVLMILTAAVDELGNGANFSLVPHCNPSSNG +VMTGIVGAMGNLGGVWFALMFRFQPSPFGKAFWIAGVVTMVTSVLLVVIRVPRK >Thiar1_121068 MGLKFHHLYASPEVNPASLKARSIPFFNPVDIYGRVFFFSWFGFMVAFWAWAAFPPLLTKVIQKELGLTP +AEVANSNIISPCAALLVRLVAGPLCDQFGPRIVFGGLLLVGSIPLGLAPLVHNAAGLYVSRFFIGILGG +AFVPCQVWSTGFFDKNIVGTANALAGGFGTAGGGITYFVMPAVYDAFVSYGHTAGEAWRLAFIVPLAVV +ITTGTALIVLCPDTPTGKWSERHLANSTPDDGSPSHNMTPANCSVIDVPGRITDKLPSPTAPSLSLSSR +QDPESGRQKPSEKNSHLANHKPMLDPESQLPIITLATAANTTKSEVVQKPTLSQAIRVAFSPQAIFHLL +TYMCSFGSELAINMIISSYYVKNFPSLSQTSAATFAALFGFQNFVTRPLGGVVSDLLYNYCGRSLWLKK +LWIVSCGVLAGVFLIVTGRLDPHGEGAMFGLVAVAGVFLQAGNGANFSLVPHVHPFANGILSGLTGAGG +NFGGVVFSVIFRFMDGGTNYAKGFWVIGVVNLVVCLGLSWIPPLPKGQVGGH >Thiar1_767720 MGFKPSDLWRTPEVNPVNKKARSVPILNPIDRHGRVFFFSWMGFMLAFWAWYTFPPLLSVTIKKDLNLTS +EEVANSNIVSLVATLLVRFAAGPLCDLLGSRKVFSLILLVGSIPIGLAPLIKDATGLYIIRFFIGILGG +SFVPCQVWCTGWFDKNVVGTANALSGGWGNAGGGITYFIMPAVYDSLVHRHGHTSGEAWRITFIVPLVC +LITCGLGLLFLCDDTPMGKWSDRHENVQQNLETQGISGKVVAITGNIADREPPSSSTSPSRAPSDIEKA +DPEKPKLTGDLTVTEAIETAQGETVVKPTFRDSLPVVFSLHALFHTATYACSFGGELAVNSILGAYYLK +NFPHLGQTNASNYAALFGFLNFVTRPLGGVVGDMLYNYFGRNLWLKKIWIHVCGLLTGALLILIGMLDP +HDLGTMVGLIVLMAVFHEAGNGANFALVPHVYPHANGVLSGLTGAGGNLGGVVFAIIFRYMDNGTNYAK +GFWVIGIMHIILNLAVCWIPPIPKGQIGGR >Micmi1_478558 MPAIFDHFVGHYKLSPHDAWRRAFFVPFAIIVGTAILMVVLCPDTPVGKWSERHEAVEANLRVIHEQGRH +VPQSSVVYGESTGASPADSTEKVDKLAITKDVEFGKGEVTEIDAEYAHEVIEKPSAKEIFKVFISPQTV +ALMACYFNSFGSELAINSVLGAYYLKNFPKLGQTSSGRWAAMFGLLNVYGRPLGGIISDIIYKYTKGNL +WAKKIWIHFLGVTMGVFMLAIGLANSHNQHTMIGLVAGLAFFMDASNGANFALVPHVHPQANGIVSGFV +GAVGNFGGVIGAIIFRYNVTNYGKSIWILGVIAIVMNLSVAWIRPIPKGQIGGR >Micmi1_311120 MGFNPAVLFKAPQVNPITKKARSIPILNPFNVYGRVFFFSWWGFMVAFLSWYAWSPLIGETIKADLKLTQ +AQIANSNILALVATLLVRCIAGPLCDKFGPRLVFAGVLLAGAVPTAFAFAIKNAAGLIVLRFFVGILGG +SFVPCQVWSTGFFDKNIVGTANSITGGFGNAGGGITYFVMPAIFDTFVNHYGMTKHKAWRMAFFVPFGM +IVGTAILMLLLTPDTPVGKWKDRHAAVEANLRAEHEAGRIIPHTGLGEAHHAHGPPLVLDDKKNDSTSD +VEHGTGEVVAVDTEYSHEVVMSPTFKEIVQIALSPQTLTLMACYFCSFGAELAINSILGAYYLKNFPKL +GQSGSGDWAAMFGLLNVVFRPMGGMMSDALYKFTGGKVWSKKILVHVMGVLMGMFMIIIGATDSHNRST +MVGLIAGLAFFLEAGNGANFGLVPHVHPYANGVVSGFTGASGNLGGIIGAIIFRYNGLHYGKSIWIFGI +IAIVLNLAVCWIRPVPKGQIGGR >Aspnid1_6363 MKPTQVLRLAVAAPDVNPQTRKARSIPVLNPFDLYGRVFFFSWIGFLVAFLSWYAFPPLLSVTIKKDLHM +SQDDVANSNIVALLGTFVMRFIAGPLCDRFGPRLVFVGLLICGAVPTAMAGLVTTPQGLIALRFFVGIL +GATFVPCQVWCTGFFDKNIVGTANSLAGGFGNAGGGITYFVMPAIYDSFVHDRGLTPHKAWRVSYIVPF +IIIVSIALAMLFTCPDTPTGKWADREKTSGQSIVDLSSTPNASSANSINISSDEKKAVHPEVTDSEAQV +HVRAGQIESSDAVIEAPTIKRYLSIALDPSALAVAVPYACSFGAELAINSILGAYYLLNFPLLGQTQSG +RWASMFGLVNVVFRPMGGFIADLIYARTNSVWAKKMWLVVLGLAMSGMAILIGFLDPHRESVMFGLVVL +MAFFIAASNGANFAIVPHVHPSANGIVSGIVGGMGNFGGIIFAIVFRYNGTQYHRSLWIIGFIILGCTL +FFSWVRPVPKQNH >Aspnid1_5705 MDFAKLLVASPEVNPNNRKALTIPVLNPFNTYGRVFFFSWFGFMLAFLSWYAFPPLLTVTIRDDLDMSQT +QIANSNIIALLATLLVRLICGPLCDRFGPRLVFIGLLLVGSIPTAMAGLVTSPQGLIALRFFIGILGGT +FVPCQVWCTGFFDKSIVGTANSLAAGLGNAGGGITYFVMPAIFDSLIRDQGLPAHKAWRVAYIVPFILI +VAAALGMLFTCDDTPTGKWSERHIWMKEDTQTASKGNIVDLSSGAQSSRPSGPPSIIAYAIPDVEKKGT +ETPLEPQSQAIGQFDAFRANAVASPSRKEAFNVIFSLATMAVAVPYACSFGSELAINSILGDYYDKNFP +YMGQTQTGKWAAMFGFLNIVCRPAGGFLADFLYRKTNTPWAKKLLLSFLGVVMGAFMIAMGFSDPKSEA +TMFGLTAGLAFFLESCNGAIFSLVPHVHPYANGIVSGMVGGFGNLGGIIFAIIFRYSHHDYARGIWILG +VISMAVFISVSWVRPVPKSQMRE >PenroP1_04323 MAPGFFKRLYVSPEINPSTHKAKSIPVLNPFDKYGRVFFFSWLGFMVAFLSWYAFPPLLNVTIKKDLKMT +QEDVANSNIVALLATLLVRFVAGPLCDRYGPRLVFVGLLLCGAIPTAMAGLVTGPKGLIALRFFIGILG +GTFVPCQVWCTGFFDKSIVGAANSLSGGWGNAGGGITYFVMPAVYDSLVQSRGIPSHKAWRIAYVIPFI +IITAVALCMLVLCEDTPTGKWSERNLWAKDSNGTTSAPNANIVDINSCTSSSGTMTPHNAATIDSEKKG +TQSPHVIDDTPATGQIDIFRQETVVSPTRREALNVAMSLSTMALAIPYACSFGSELAINSMLGSYYTEQ +FPHMSQTKSGQWAAMFGLLNVVCRPAGGLFGDLVYLYTGTAWSKKILIAFLGIGMGAFQLAIGLSNPST +EATMFGLVAGLAFFIEASNGANFALVPHVYPFANGIVSGIVGGLGNLGGIIFAIIFRYNGSNYGRSLWI +IGVISLATNLAVSWIRPIPKSQTLS >Pyrtt1_5571 MPFAISMLWSAPELNPYNKKARSIPVLNPVNKYGRVFFFSWLGFFIAFWSWYAFPPLLSKSIKADMHLSQ +DQIANSNIVALCATLLVRFIAGPMCDHFGPRITFASLLFAGAIPTALAGTAHNATGLYFIRFFVGILGG +TFVPCQVWTTGFYDKNVVGSANALVGGWGNSGGGITYFVMPVIYDSLKSNQGLSSHVAWRVSFIVPFVL +ISACAVALLLLTEDTPTGKWSERGVTVVSGDQPNQAGHSIVPTTGALDDKPSTAASLSSNDEKKYENTA +ADVETANGDVQIMDEVQHEVVVKPSLKEGLKVMFSLQTGALCAGYFCSFGGELAINSILGAYYLKNFPY +LGQTQSGRWAAMFGLLNVITRPLGGFIADLLYQTTGHNLWAKKLWINFVGIMTGVMCIIIGKLDPHNLS +EMIGLIALMAIFLEAGNGANFALVPHVHPHANGVLSGIVGATGNFGGIIFAIIFRYHKTNYSQVFWIIG +IMIIALNCAFIWVRPIPKNQIGGR >Sodal1_324937 MGLDYLWKAPEVNPINLKACRRFETTRKIQGCPANKLQQARSVPVLNPFNKYGAAFFFSWMGFMIAFWAW +YTFPPLLTVTIRDDLNLTPAQVANSNIVSLSSTLLMRLLAGPACDKFGSRLVFGGLLLLGALPVGLAPL +VQDATGLYISRFFIGVLGATFVPCQVWCTGFFDKNIVGTANALAGGWGNAGGGITYFVMPAVFDSFRDR +GYSPAVAWRLTFIVPLICIIVCGVGLILCCEDTPMGKWSDRHLHIQENLRNQGVEDATLVNVVNVPGGI +TDRPEPSPAPASADEERNSSTKSRKDESHFDAQAIDLSRAEMLETAQGETVAKPSLRDSLRVAVSPQTI +FHVLTYACSFGGELAINAILSSYYLKNFPHLGQTGASNWAAMFGFLNFVTRPLGGIVGDLLYNYVGRDL +WWKKGWIVLCGVATGVLLVLIGQLDPHHEPTMFGLIFLMAVFHEAGNGANFALVPHVHPAANGVLSGLT +GAGGNLGGVVFAIIFRFMDGGTDYAKGFWVIGCMHIGLNLLVSWIPPLPKGQIGGH >Aspfl1_27006 MDSVKLLFLSPEVNPSNRKARSIPILNPFDKYGRVYFFSWLGFMVAFLSWYAFPPLLTVTIRKDLKMTQP +EVANSNIVALLATLLVRFVAGPLCDRFGPRLVFIGLLLCGSIPTAMAGLVTNAQGLIALRFFVGILGGT +FVPCQVWCTGFFDKKIVGTANSLAAGWGNAGGGITYFVMPAIFDSLVHNQGLPAHKAWRVAYIVPFIII +VVIAVAMFFTCEDTPTGKWSERHLWAEETSRFEGNIVNINSGISSSHPSSPPSTTNIVADLEKKGNPSP +PESIAPMPGQLESLRTDTVVAPTFKEAMNVLLSLSTAAVAIPYACSFGAELAINSILGDFYAENFPYMG +QTKTGQWAAMFGLLNVICRPAGGFIADLLYRHTQSVWSKKILLSFLGVGMGAFQLALGFSNPKSEATMF +GLTAGLAFFLEACNGANFAVVPHVHPFANGIVSGAVGGMGNLGGIIFAIIFRYNGSHYARSLWIIGIIA +IAANLAVSWIRPVPRPQMV >Necha2_90170 MGFQIAHMWKAPEVNPISRKARSVPVLNPVDIYGRVFFFSWMGFMLAFWAWYTFPPLLTVTIKKDLHLTP +AQIANSNIVSLSATFFLRFITGPLCDQFGPRRVFAYLILLGCFPIGLAPLVKNATGLYISRFFIGILGA +TFVPCQVWCTGFFDKNIVGTANALAGGWGNAGGGITYFIMPAVFDSLVAHQGLTPSKAWRVTFIVPLIC +LIVCGVGMLLLCPDSPMGDWDDQAQQVRKNMEEHGVSTSDEITPVGTPDRSGSDEENSAGCEKDVKVGD +HEHSITRNEAMEIAQGEIVVKPSLKEALPVLYSPQTFFHVATYACSFGGELAINAVLSAYLKKNFPHLD +QTKASNYAAIFGFLNFVNRPLGGVIADILYNKFGRNLWLKKGWITVCGLLTGALLILIGRVNPAESNGG +TVGTFVGLIVLMSVFHEAGNGANFALVPHVHPFANGILSGLTGGGGNLGGVIFAIIFRFMNQGKDFAMG +FWVIGIIHIALNLAVCWIPPLPKGQVGGH >Conli1_4928 MGVNIKFTDLYKAPDVNPVNRKAHSIPALNPINMYGRVFFFSWFGFMIAFWAWYTFPPLLTVTIKADLHL +TPAQVANSNIVSLVATLFIRFVAGPLCDMYGPRRVFAGTLLVGALPLGLAPLIHNATGLYVSRFFIGIL +GGSFVPCQVWSTGFFDKNIVGTANALTGGFGNAGGGITYFIMPAVYDSFVHMGHTPHQSWRLTFIVPLI +MAIATGLSLLLLCPDTPMGKWSERHLHVQENLQLHGVAERVVDIPGGITDKATPSESHVSDGEEKKPVT +YDHEAALSKSEMIETAQGETIQKPTLREALPVIFSQQTAFHFLTYFCSFGGELAINAILSSYYLKNFPT +LGQTKASNWAAMFGFLNFVTRPLGGVVSDLLYNLAGRNLWLKKGWITTCGVATGALLILIGQLNSHHQS +TMYGLVALMAFFLEAGNGANFALVPHVHPFANGILSGVTGAGGNLGGVVFAVIFRFMGGGTNYAKAFWV +IGVIHIAMNLSVCWIRPLPKGQIGGH ...ext

Replies are listed 'Best First'.
Re^5: Counting matches
by hippo (Bishop) on May 29, 2017 at 16:04 UTC

    There is absolutely nothing preventing you from expanding the @hant and @counts arrays. Why do you suppose I set them up as arrays in the first place even though you had only given us one data point in your sample?

      #!/usr/bin/env perl use strict; use warnings; use Test::More; use File::Slurp; #Counter.pl open my $handle, '<', "/home/nic/Desktop/5-27-17/Genomes_used_HANT.txt +"; chomp(my @HANT = <$handle>); close $handle; my @counts = (0,1,2,3,4,5,6,7,8,9,10); plan tests => scalar @hant; my $nrt2 = read_file("/home/nic/Desktop/5-27-17/NRT2.txt"); for my $hant (@HANT) { my $matches = () = my $nrt2 =~ /$hant/g; is ($matches, shift @counts, "Number of matches found for $hant"); }
      Ok, I tried this script and got the following error. "You said to perform 0 tests at Counter.pl line 13". What should I change? Sorry like I said I am new at writing perl scripts. My script might not make sense.lol

        This:

        chomp(my @HANT = <$handle>); ... plan tests => scalar @hant;

        Perl is case sensitive, unless otherwise stated for details. @HANT and @hant are different arrays.

        perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re^5: Counting matches
by poj (Abbot) on May 29, 2017 at 17:38 UTC

    How big are the 2 files ? Try

    #!/usr/bin/env perl use strict; use warnings; my %count = (); my $dir = '/home/nic/Desktop/5-27-17/'; my $file1 = $dir.'Genomes_used_hant.txt'; open IN,'<',$file1 or die "Could not open $file1 : $!"; while (<IN>){ chomp; $count{$_} = 0; } close IN; my $file2 = $dir.'NRT2.txt'; my $n = 0; open IN,'<',$file2 or die "Could not open $file2 : $!"; while (my $line = <IN>){ if ($line =~ /^>(.*)_/){ ++$count{$1} if exists $count{$1}; } ++$n; } close IN; print "$n lines read from $file2\n\n"; for (sort keys %count){ if ($count{$_} > 0 ) { printf "%-15s count %d\n",$_,$count{$_}; } }
    poj
      What does this line do/mean?
      printf "%-15s count %d\n",$_,$count{$_};

        it formats the print output, see printf and sprintf

         %-15s  a string 15 characters minimum width left justified
         %d     signed integer
        
      Perfect!
Re^5: Counting matches
by AnomalousMonk (Archbishop) on May 29, 2017 at 19:19 UTC

    For those monks following along at home, please note that the sample  NRT2.txt Fasta file given here has none of the genomes in the sample  Genomes_used_hant.txt file! You'll have to roll your own sample data. Thank you, Nicpetbio23!.


    Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1191517]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2024-04-25 23:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found