Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Balancing number of files against size of files in optimizing access speed

by di (Acolyte)
on Jan 09, 2010 at 21:28 UTC ( #816546=perlquestion: print w/replies, xml ) Need Help??

di has asked for the wisdom of the Perl Monks concerning the following question:

I am working with a large text of about 6.5 MB, the words of which will be indexed to the paragraphs in which they occur. Search on a word through a browser interface will return the paragraphs. My question is what might be the optimum number of files in which to store the text from which the paragraphs will be extracted. The text would naturally lend itself to storage in 1, 4, 197, or 1628 files.

Returns could be a few or hundreds - even thousands. My guess is that a few returns would be best (most quickly) extracted from a few small files, whereas a large number of returns would be better extracted from one large file. Is this correct? What are the relative impacts of number and size of files on access speed? What are the criteria for balancing them. Should I simply seek the middle way? Are there other factors I should consider?

  • Comment on Balancing number of files against size of files in optimizing access speed

Replies are listed 'Best First'.
Re: Balancing number of files against size of files in optimizing access speed
by GrandFather (Saint) on Jan 09, 2010 at 21:46 UTC

    I suspect 1 file with an .sqlite extension will likely work best. See DBI and DBD::SQLite.


    True laziness is hard work
Re: Balancing number of files against size of files in optimizing access speed
by sflitman (Hermit) on Jan 09, 2010 at 22:18 UTC
    Life's too short to reinvent the wheel. Use KinoSearch for indexing a corpus of 1628 files, one per paragraph.

    HTH,
    SSF

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://816546]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2022-10-04 22:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred way to holiday/vacation is:











    Results (19 votes). Check out past polls.

    Notices?