Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
This is probably a stupid question, but what's the best way to store text from forum posts? Assume that there's a large number of posts, that text size may vary from a few lines to as much as 10K or so, that the maximum number of files allowed inside a folder is 1024, that efficient searching is a consideration. I realize that with a double-layered folder system and 16-bit hashing, I can store probably around 33 million files without seriously affecting lookup time (256 * 256 * 512 - I'm allowing a 50% margin), but backing up or transferring these files would be a major pain, and I'd have to have at least one merged file for searching purposes anyhow. But what about database searches? Seems to me that there I'd also have to have a marged copy of the text there in a processed (punctuation and extra spaces removed) format, and just searching the text as it resides in the database won't be much help. So it's basically a matter of storage efficiency. Will storing the posts in the database instead of as individual files take up a lot more space, if so how much, and which is more efficient for lookup / edit? Which would you pick and why?

I know the following node asks pretty much the same question, but there isn't enough information, specifically on the issue of storage space for database vs filesystem, for me to make a decision.

Incidently, what if I stored all the post text in a single file outside of the database, using fixed size sectors and just storing the sector start and number of sectors used for each post in the database? Given a reasonably well-designed sector size (factor or multiple of disk sector?) I could reuse sectors from "decayed" posts (new post needs 3 sectors, look for decayed post with 3 sectors) without wasting large amounts of disk space or significantly increasing lookup times. Is this worth the trouble, or does mySQL internally do the same sort of thing for large chunks of text? In other words, if the max size of the text is 64K, and the post is actually only 2K, is the extra 62K wasted? Would it be better to use, say, 8K sectors and only waste 6K?

In reply to Large chunks of text - database or filesystem? by TedPride

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2022-06-25 04:13 GMT
Find Nodes?
    Voting Booth?
    My most frequent journeys are powered by:

    Results (81 votes). Check out past polls.