Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

accessing files

by arunmep (Beadle)
on Jul 14, 2005 at 12:17 UTC ( [id://474844]=perlquestion: print w/replies, xml ) Need Help??

arunmep has asked for the wisdom of the Perl Monks concerning the following question:

hai everybody iam basically a biologist. iam developing a database for a particular plant. my problem is iam having a lot of text and pdf files i need to do create a search tools that will search all the files and give the files that have the given keyword. i want know is it be efficient to develop the tool in perl in terms of speed(searching 100's of files).please reply me

Replies are listed 'Best First'.
Re: accessing files
by mpeters (Chaplain) on Jul 14, 2005 at 13:28 UTC
    I would look at swish-e. It's an extemely fast and flexible tool for indexing and searching various kinds of documents (html, xml, text, pdf, doc, etc) and has a nice Perl interface.

    -- More people are killed every year by pigs than by sharks, which shows you how good we are at evaluating risk. -- Bruce Schneier
Re: accessing files
by blazar (Canon) on Jul 14, 2005 at 12:28 UTC
    100's of files doesn't sound much like a thing that should scare perl. As far as text files are concerned, perl's own functions and operators are all that you need. For pdf files, you can search CPAN for pdf tools.
Re: accessing files
by straywalrus (Friar) on Jul 14, 2005 at 14:49 UTC
    You may also want to take a look at 'Begining Perl for BioInformatics' by James Tisdall from O'Reilly. Although this does not specifically answer your question, it may help with future problems you have as a biologist. Check it out Here
Re: accessing files
by garrison (Scribe) on Jul 14, 2005 at 14:24 UTC
    I use Perl to process thousands of PDF files and have no complaints about speed, although for maximal performance we store everything in a database and search that.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://474844]
Approved by RolandGunslinger
Front-paged by RolandGunslinger
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2024-04-14 14:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found