Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: Searching array against hash

by BrowserUk (Patriarch)
on Aug 22, 2013 at 01:55 UTC ( [id://1050440]=note: print w/replies, xml ) Need Help??


in reply to Re: Searching array against hash
in thread Searching array against hash

Without prejudicing the OPs response in any way, can I ask you, what you think using Bio::DB::Fasta would have over the OPs 50 line module?

Given that to get it, he would have to try and install it and its 897 codependants -- not to mention AnyDBM_File and at least one of DB_File GDBM_File NDBM_File SDBM_File; all of which are a known problem on my platform -- and somehow resolve the 95 "cannot resolve"s (whatever they are?):

C:\perl64\packages\BioPerl-1.6.901>Build.bat Building BioPerl Build.bat: blib\lib\Bio\Tools\Run\WrapperBase\CommandExts.pm: cannot r +esolve L<bioperl-l@bioperl.org> in paragraph 94. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\ace.pm: cannot resolve L<bioper +l> in paragraph 8. Build.bat: blib\lib\Bio\Search\Hit\hmmer3Hit.pm: cannot resolve L<biop +erl-l@bioperl.org> in paragraph 15. Build.bat: blib\lib\Bio\Search\Hit\hmmer3Hit.pm: cannot resolve L<scor +e()|score> in paragraph 45. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\dbi\mysqlace.pm: cannot resolve + L<bioperl> in paragraph 8. Build.bat: blib\lib\Bio\Variation\IO.pm: cannot resolve L<perltie> in +paragraph 66. Build.bat: blib\lib\Bio\SeqIO.pm: cannot resolve L<perltie> in paragra +ph 74. Build.bat: blib\lib\Bio\SearchIO\hmmer3.pm: cannot resolve L<bioperl-l +@bioperl.org> in paragraph 15. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\dbi\mysql.pm: cannot resolve L< +bioperl> in paragraph 165. Build.bat: blib\lib\Bio\DB\SeqFeature.pm: cannot resolve L<bioperl> in + paragraph 107. Build.bat: blib\lib\Bio\Restriction\Enzyme.pm: cannot resolve L<non_am +biguous_length> in paragraph 214. Build.bat: blib\lib\Bio\DB\SeqFeature\NormalizedTableFeatureI.pm: cann +ot resolve L<bioperl> in paragraph 12. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\dbi\pg_fts.pm: cannot resolve L +<mailto:bioperl-l@lists.open-bio.org> in paragraph 44. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\dbi\pg_fts.pm: cannot resolve L +<mailto:gmod-gbrowse@lists.sourceforge.net> in paragraph 44. Build.bat: blib\lib\Bio\DB\GFF\Featname.pm: cannot resolve L<bioperl> +in paragraph 44. Build.bat: blib\lib\Bio\SearchIO\Writer\HTMLResultWriter.pm: cannot re +solve L<remote_database> in paragraph 77. Build.bat: blib\lib\Bio\SearchIO\Writer\HTMLResultWriter.pm: cannot re +solve L<remote_database> in paragraph 86. Build.bat: blib\lib\Bio\SearchIO\Writer\HTMLResultWriter.pm: cannot re +solve L<remote_database> in paragraph 97. Build.bat: blib\lib\Bio\SeqIO\embldriver.pm: cannot resolve L<annotati +on()|annotation> in paragraph 23. Build.bat: blib\lib\Bio\Tools\Alignment\Consed.pm: cannot resolve L<ge +t_quality_scalar()|get_quality_scalar> in paragraph 83. Build.bat: blib\lib\Bio\Tools\Alignment\Consed.pm: cannot resolve L<ge +t_quality_array()|get_quality_array> in paragraph 89. Build.bat: blib\lib\Bio\Tools\Alignment\Consed.pm: cannot resolve L<ge +t_contigs()|get_contigs> in paragraph 100. Build.bat: blib\lib\Bio\Tools\Alignment\Consed.pm: cannot resolve L<ge +t_contigs()|get_contigs> in paragraph 105. Build.bat: blib\lib\Bio\Tools\Alignment\Consed.pm: cannot resolve L<ge +t_contigs()|get_contigs> in paragraph 110. Build.bat: blib\lib\Bio\Tools\Alignment\Consed.pm: cannot resolve L<ge +t_contigs()|get_contigs> in paragraph 115. Build.bat: blib\lib\Bio\Tools\Alignment\Consed.pm: cannot resolve L<su +m_lets()|sum_lets> in paragraph 222. Build.bat: blib\lib\Bio\SeqIO\embl.pm: cannot resolve L<annotation()|a +nnotation> in paragraph 23. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\dbi\mysqlcmap.pm: cannot resolv +e L<bioperl> in paragraph 183. Build.bat: blib\lib\Bio\ClusterIO.pm: cannot resolve L<perltie> in par +agraph 29. Build.bat: blib\lib\Bio\Search\Hit\HMMERHit.pm: cannot resolve L<score +()|score> in paragraph 70. Build.bat: blib\lib\Bio\Search\Hit\HMMERHit.pm: cannot resolve L<hsp() +|hsp> in paragraph 78. Build.bat: blib\lib\Bio\Search\Hit\PullHitI.pm: cannot resolve L<expec +t()|expect> in paragraph 142. Build.bat: blib\lib\Bio\Search\Hit\PullHitI.pm: cannot resolve L<signi +f()|signif> in paragraph 142. Build.bat: blib\lib\Bio\Seq\Quality.pm: cannot resolve L<force_flush> +in paragraph 26. Build.bat: blib\lib\Bio\Search\Tiling\MapTiling.pm: cannot resolve L<A +LIGNMENTS/get_tiled_alns> in paragraph 20. Build.bat: blib\lib\Bio\Align\Graphics.pm: cannot resolve L<GD> in par +agraph 205. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\memory.pm: cannot resolve L<bio +perl> in paragraph 22. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\dbi\caching_handle.pm: cannot r +esolve L<DBI> in paragraph 56. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\dbi\caching_handle.pm: cannot r +esolve L<bioperl> in paragraph 56. Build.bat: blib\lib\Bio\Search\Hit\HmmpfamHit.pm: cannot resolve L<num +_hsps> in paragraph 76. Build.bat: blib\lib\Bio\DB\SeqFeature\Store\Loader.pm: cannot resolve +L<bioperl> in paragraph 187. Build.bat: blib\lib\Bio\Search\Hit\HitI.pm: cannot resolve L<expect()| +expect> in paragraph 126. Build.bat: blib\lib\Bio\Search\Hit\HitI.pm: cannot resolve L<signif()| +signif> in paragraph 126. Build.bat: blib\lib\Bio\Search\Hit\HitI.pm: cannot resolve L<frac_alig +ned_query()|frac_aligned_query> in paragraph 136. Build.bat: blib\lib\Bio\Search\Hit\HitI.pm: cannot resolve L<frac_alig +ned_hit()|frac_aligned_hit> in paragraph 136. Build.bat: blib\lib\Bio\FeatureIO.pm: cannot resolve L<perltie> in par +agraph 55. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\dbi\oracleace.pm: cannot resolv +e L<bioperl> in paragraph 8. Build.bat: blib\lib\Bio\Tools\Run\RemoteBlast.pm: cannot resolve L<FEE +DBACK> in paragraph 36. Build.bat: blib\lib\Bio\Search\Result\HMMERResult.pm: cannot resolve L +<next_models> in paragraph 9. Build.bat: blib\lib\Bio\Search\Hit\GenericHit.pm: cannot resolve L<exp +ect()|expect> in paragraph 134. Build.bat: blib\lib\Bio\SeqIO\genbank.pm: cannot resolve L<annotation( +)|annotation> in paragraph 22. Build.bat: blib\lib\Bio\Structure\IO.pm: cannot resolve L<perltie> in +paragraph 56. Build.bat: blib\lib\Bio\Search\Hit\ModelHit.pm: cannot resolve L<expec +t()|expect> in paragraph 104. Build.bat: blib\lib\Bio\Search\Hit\ModelHit.pm: cannot resolve L<signi +f()|signif> in paragraph 104. Build.bat: blib\lib\Bio\Search\Result\hmmer3Result.pm: cannot resolve +L<bioperl-l@bioperl.org> in paragraph 15. Build.bat: blib\lib\Bio\DB\SeqFeature\Store\berkeleydb3.pm: cannot res +olve L<bioperl> in paragraph 15. Build.bat: blib\lib\Bio\DB\Fasta.pm: cannot resolve L<bioperl> in para +graph 123. Build.bat: blib\lib\Bio\DB\SeqFeature\Store\berkeleydb.pm: cannot reso +lve L<bioperl> in paragraph 264. Build.bat: blib\lib\Bio\DB\SeqFeature\Store\LoadHelper.pm: cannot reso +lve L<bioperl> in paragraph 9. Build.bat: blib\lib\Bio\DB\DBFetch.pm: cannot resolve L<http:E<sol>E<s +ol>www.ebi.ac.ukE<sol>cgi-binE<sol>dbfetch> in paragraph 8. Build.bat: blib\lib\Bio\DB\SeqFeature\NormalizedFeature.pm: cannot res +olve L<bioperl> in paragraph 159. Build.bat: blib\lib\Bio\Search\Hit\BlastPullHit.pm: cannot resolve L<n +um_hsps> in paragraph 74. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\dbi.pm: cannot resolve L<bioper +l> in paragraph 496. Build.bat: blib\lib\Bio\Root\Root.pm: cannot resolve L<Error> in parag +raph 21. Build.bat: blib\lib\Bio\Root\Root.pm: cannot resolve L<Error> in parag +raph 22. Build.bat: blib\lib\Bio\Root\Root.pm: cannot resolve L<Error> in parag +raph 22. Build.bat: blib\lib\Bio\Root\Root.pm: cannot resolve L<Error> in parag +raph 36. Build.bat: blib\lib\Bio\Search\SearchUtils.pm: cannot resolve L<_adjus +t_contigs> in paragraph 16. Build.bat: blib\lib\Bio\DB\SeqFeature\Store\memory.pm: cannot resolve +L<bioperl> in paragraph 125. Build.bat: blib\lib\Bio\DB\GFF\Homol.pm: cannot resolve L<bioperl> in +paragraph 28. Build.bat: blib\lib\Bio\DB\GFF\RelSegment.pm: cannot resolve L<bioperl +> in paragraph 251. Build.bat: blib\lib\Bio\DB\GFF\Segment.pm: cannot resolve L<bioperl> i +n paragraph 229. Build.bat: blib\lib\Bio\Seq\MetaI.pm: cannot resolve L<names_submeta> +in paragraph 14. Build.bat: blib\lib\Bio\SeqEvolution\DNAPoint.pm: cannot resolve L<res +et_sequence_counter> in paragraph 9. Build.bat: blib\lib\Bio\Assembly\IO\sam.pm: cannot resolve L<bioperl-l +@bioperl.org> in paragraph 44. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\dbi\mysqlopt.pm: cannot resolve + L<bioperl> in paragraph 8. Build.bat: blib\lib\Bio\AlignIO.pm: cannot resolve L<print> in paragra +ph 51. Build.bat: blib\lib\Bio\AlignIO.pm: cannot resolve L<perltie> in parag +raph 66. Build.bat: blib\lib\Bio\Search\HSP\ModelHSP.pm: cannot resolve L<seq_s +tr()|seq_str> in paragraph 52. Build.bat: blib\lib\Bio\DB\GFF\Typename.pm: cannot resolve L<bioperl> +in paragraph 49. Build.bat: blib\lib\Bio\DB\GFF\Adaptor\berkeleydb.pm: cannot resolve L +<bioperl> in paragraph 33. Build.bat: blib\lib\Bio\DB\SeqFeature\NormalizedFeatureI.pm: cannot re +solve L<bioperl> in paragraph 12. Build.bat: blib\lib\Bio\DB\GFF\Feature.pm: cannot resolve L<bioperl> i +n paragraph 295. Build.bat: blib\lib\Bio\Assembly\Contig.pm: cannot resolve L<Coordinat +e_Systems> in paragraph 120. Build.bat: blib\lib\Bio\Root\Exception.pm: cannot resolve L<Error.pm t +ry|Error/try> in paragraph . Build.bat: blib\lib\Bio\Root\Exception.pm: cannot resolve L<Error.pm t +ry|Error/try> in paragraph 12. Build.bat: blib\lib\Bio\Root\Exception.pm: cannot resolve L<Error> in +paragraph 40. Build.bat: blib\lib\Bio\Root\Exception.pm: cannot resolve L<Error> in +paragraph 46. Build.bat: blib\lib\Bio\Root\Exception.pm: cannot resolve L<pretty_for +mat()|pretty_format> in paragraph 91. Build.bat: blib\lib\Bio\Search\HSP\hmmer3HSP.pm: cannot resolve L<biop +erl-l@bioperl.org> in paragraph 15. Build.bat: blib\lib\Bio\DB\SeqFeature\Segment.pm: cannot resolve L<bio +perl> in paragraph 121. Build.bat: blib\lib\Bio\SeqIO\swiss.pm: cannot resolve L<annotation()| +annotation> in paragraph 45. Build.bat: blib\lib\Bio\DB\SeqFeature\Store\GFF2Loader.pm: cannot reso +lve L<bioperl> in paragraph 181.

all to install an indexing module that he doesn't need, and won't benefit from, in order to complete his task in "a couple seconds".


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^3: Searching array against hash
by abualiga (Scribe) on Aug 22, 2013 at 02:41 UTC
    Without prejudicing the OPs response in any way, can I ask you, what you think using Bio::DB::Fasta would have over the OPs 50 line module?

    Not having a CS background, I cannot analyze an algorithm for efficiency. However, Bio::DB::Fasta is an exisiting and, among biologists, widely used solution to a common task, presented by the OP. So, like you, I exercise the privilege to offer my solution.

    Given that to get it, he would have to try and install it and its 897 codependants -- not to mention AnyDBM_File and at least one of DB_File GDBM_File NDBM_File SDBM_File; all of which are a known problem on my platform -- and somehow resolve the 95 "cannot resolve"s (whatever they are?):

    So far, I've installed BioPerl on Ubuntu, RedHat, and Mac OS X without problems. If the OP happens to be in a biomedical field, it may be worthwhile to install and use BioPerl as it does solve many genomic tasks.

    all to install an indexing module that he doesn't need, and won't benefit from, in order to complete his task in "a couple seconds".

    "Without prejudicing the OPs response in any way"

      "Without prejudicing the OPs response in any way"

      Meaning: he will speak for himself if he so chooses. My inquiry was for my benefit.

      Having helped a growing list of bio-guys to avoid the byzantine Bio::Empire, I'm always looking to maintain my knowledge of the reasons for its highly complicated, intricate and involved nature.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Searching array against hash
by bioinformatics (Friar) on Aug 22, 2013 at 02:25 UTC

    It will help if you are looking to retrieve a subsequence from the human genome, the FASTA file of which is about 5 Gb; in the case of parsing a smaller FASTA file with just a few entries, it makes less of a difference.

    Bioinformatics
      It will help if you are looking to retrieve a subsequence from the human genome, the FASTA file of which is about 5 Gb;

      I guess things have moved on. The version I have is just under 3GB and came in 25 files chr(1-22, M, X, Y).

      That said, if his 3 posted sequences are representative of his 900,000; that means his file is a tad under 900MB.

      Which if he can process that in "a few seconds"; means he could process your 5GB file in 5+bit * "a few seconds".

      But, and here is the point. It will take Bio::DB::Fasta at least that same 5+bit*"a few seconds" to construct an index; before he can start processing anything. So for a one-off process, there is a net loss.

      Now the real crux. Given all the additional layers and overheads; how many times does he have to redo the process in order to obtain a net gain? (If ever.)

      Then add to that the (potential) problems with installation; and the learning curve of finding your way around the documentation for 897 modules to find the one that you want; and then learning how to use it to do what you want; and suddenly the reason why so many bioinformaticians are looking for Lite alternatives to the Bio::Behemoth and simple procedures in order to get their work done; rather than becoming technical debt slaves to the byzantine Bio::Empire.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        You're not quite reading my response right. What the OP wants is to retrieve the DNA sequence that corresponds to a specific ID. That's simple/fast enough, especially when using a hash. When I want to retrieve a subsequence, chr5:1234567-1234798 for example, which is only a portion of the sequence associated with a specific record in the FASTA file, then using Bio::DB::Fasta is far faster. The module has its uses, and is why someone implemented a similar thing in python as Pygr (the indexing approach, not the parser per say). You're not wrong, Bio::DB::Fasta is overkill for this specific purpose; I just don't see bashing a tool that has been helpful to bioinformaticists for something close to a decade. Also, it installs just fine on linux, where most of the users will be using it ;)

        Bioinformatics

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1050440]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2024-03-19 06:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found