Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Large chunks of text - database or filesystem?

by TedPride (Priest)
on Mar 19, 2005 at 23:47 UTC ( #440961=note: print w/replies, xml ) Need Help??


in reply to Large chunks of text - database or filesystem?

1. What is the maximum posts per minute you expect?

I honestly don't know. The forum isn't currently that busy, but it's an old modified WWWBoard system and probably isn't getting many uses for that reason. The entire site can get like 1400+ visits daily, and I'd be hoping to improve that with a restructure of the site.

2. What is the maximum reads per minute?
3. How many overall posts per day? (estimated growth rate)

Again, there's no way to estimate at this point how many reads or writes there will be.

4. How large do you expect the messages to be?

Most under 2K, some as large as 5K. I doubt we'll need more than that, though the capability should be there.

5. Is it a write once system, or will there be re-editing of messages?

Re-editing.

6. What is your hardware budget for the project, or is there fixed hardware?

Remote hosting account, with fixed hardware. Though if we had to, we could upgrade to us having the entire server to ourselves. That would only be if the site in general got a good deal more popular than it is now, however.

7. What is the required uptime?

I wouldn't want it down for more than 5 minutes a day, at most.

8. Are you going to have an internal search engine?

Yes.

9. If so, what sort of information are you going to search on? (metadata, or the message itself?)

Message itself. Ideally, the messages would be preprocessed to lowercase everything and remove unnecessary punctuation, and there'd be a small index for the most popular keywords. The search data doesn't have to be real-time - it can be generated every day or two, if necessary.

10. What are your disaster recovery requirements?

The site has regular daily backups as part of the hosting service, and we can get a restore if we destroy something by mistake.

11. Do you need to support transactional concurency?

I'm assuming there will only ever be one person editing or deleting a specific post. We might want to generate the threads as web pages, however (from accumulated post data), and these would need some form of locking / unlocking so that two people posting to a thread or editing a thread wouldn't conflict.

12. What are your time constraints?

At this point, none. I'm willing to spend a lot of time if necessary to get an efficient system going that will last a long time.

13. Do you already have a database to use for this purpose?

We have a mySQL database. I don't know what version of mySQL, however.

14. Do you already have experience with databases?

I've used mySQL a fair amount, though not much with Perl and never to store large amounts of text.

  • Comment on Re: Large chunks of text - database or filesystem?

Replies are listed 'Best First'.
Re^2: Large chunks of text - database or filesystem?
by Cap'n Steve (Friar) on Mar 20, 2005 at 03:37 UTC
    No one has asked the obvious question: Why are you doing this? I once tried to write some customized forum software and quickly realized that it'd be more trouble than it was worth. Why not go with one of the many available solutions?
      Not all of the shelf software integrates well or has good internals. If this is an integrated piece of software, then doing it by hand isn't hard. If it isn't, it's usually a hack to get them seperated. Heck, forums aren't hard to write. User accounts, posts, replies and a listing. Depending on what inefficencies you run into, you adress them, like caching certain info.

Re^2: Large chunks of text - database or filesystem?
by jhourcle (Prior) on Mar 20, 2005 at 04:57 UTC

    Well, I guess it's making sense why you're trying to optimize for space if you're running from a hosting account... but I'd still have to ask why that's your main consideration. I'm guessing that if you're trying to get more people to use your site, you're going to need to add stuff for the users, not for you. (and your users probably aren't going to be discussing the benefits of data storage techniques ... well, the might be, after all, I'm doing it right now).

    Given what you've mentioned, I'd probably store the metadata in a database, just because I would feel comfortable coding with that backend. However, I would try to make sure that I/O is encapsulated, so that I could change the storage system at some later time, if I wished. (or I could recycle it to be used elsewhere)

    But I'm going to have to agree with your decision to stop using WWWBoard. I mean, it's 19105, and you're running a program which tells you to chmod a directory to 0777. It was 'ast modified' in 1995, and is still in 'alpha'. (although, the copyright is marked 2002... I guess he knows what the important things are to update.) Hmm...okay, I should probably stop flaming Matt's Script Archive, or this post will go on for pages. -- I will give him credit that it was very nice for him to want to share with the public... and that's about the only nice thing I can say about it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://440961]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2022-05-18 13:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (71 votes). Check out past polls.

    Notices?