Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: Integrated non-relational databases ?

by rootcho (Pilgrim)
on Sep 26, 2007 at 17:45 UTC ( [id://641206]=note: print w/replies, xml ) Need Help??


in reply to Re: Integrated non-relational databases ?
in thread Integrated non-relational databases ?

but from what I read in the recent news, most of the big sites are more and more abandoning the RDBMS systems in favor in most of cases of hand made solutions. Sometimes completely RRDBMS-less, sometimes a mix.
What I'm saying is that current RDBMS can't handle very large data sets in a real-time environment.
F.e. I was recently doing experiments with a very simple table and 10_000_000 records which fit into memory. At the moment I decided to use something else which is not lookup, let say GROUP BY, execution time is a minutes instead of milliseconds.

That is why I was thinking if you are doing this uplifting in a domain "language structures", it would be easier I think to think of more efficient caching schemes, shredding and similar techniques, so you can stay in "millisecond range" easier even for very large datasets.
Mind me this is just thought not some conclusion on which is best :). It is very hard to test such things in large scale and of course the requirements of every apps are different.
http://radar.oreilly.com/archives/2006/04/database_war_stories_5_craigsl.html
Look at the links at the end of the article too
  • Comment on Re^2: Integrated non-relational databases ?

Replies are listed 'Best First'.
Re^3: Integrated non-relational databases ?
by perrin (Chancellor) on Sep 26, 2007 at 19:09 UTC
    Yeah, I read that series when it came out. I don't see how it came to the conclusion that sites are not using RDBMes. Most people interviewed said they use MySQL and have figured out how to scale it. Google wrote something custom for some of their data, and one guy used Berkeley DB, but most of them use RDBMSes for most things. Even Google makes heavy use of MySQL.
Re^3: Integrated non-relational databases ?
by mr_mischief (Monsignor) on Sep 27, 2007 at 15:00 UTC
    RDBMS can and do handle huge data sets. Was your GROUP BY grouping by an indexed column? Which RDMBS were you using for that? What kind of hardware were you using? That really seems a bit too drastic of a change, although I rarely have tables with more than 100,000 records. Is that a single GROUP BY, or is that the effect when you change a whole class of queries that overlap to use it? More servers and database replication is often the answer. Was it memory bound, processor bound, or IO bound?

    The problem with handmade solutions and with anything tied closely to a certain language is that you're giving up large amounts of flexibility. SQL was designed specifically so that different programs in different languages could communicate to the same database and use the same data manipulation routines on the same data. You lose that if you're building it in some specialized database language that has no other support. While in some cases it's worthwhile to forgo convention and flexibility for performance, you have to be sure of what you're losing and what you're gaining. To be sure requires a lot more than a bit of ad-hoc testing on one example without accounting for possible machine deficiencies.

      I agree with many of the points you and the other mentioned.
      What I was glad to see in Mnesia was that you just add the next server and it use it. Yes you may still need to think of tehniques to partition your data, but you don't have to worry so much what scheme of replication to do, do you do cluster/can you. From what I read you in fact can play the role of the planner of the query with your own code. In general this is hard to impossible to do in RDBMS.
      I'm not saying Mnesia is best than say Mysql,potgresql ...etc. In fact I don't know how scallable Mnesia is in first place ;)


      As a side question, I need to implement if I may call it "slow/lazy queries", what I mean.
      A query that takes a long time to execute say from 5min to 1 hour, but doesn't take cpu and IO resources... so that the server continues to work as if there is nothing else happening.
      Do you have idea how such thing can be achieved or is it doable at all with today databases
        The problem is that if your query takes more than 5 seconds, the snapshot of data it's looking at is out-of-date. So, you'll need to provide a solution to that problem.

        I've worked with queries that looked at millions of rows crossed with millions of rows and the longest I've ever had a query take was 15 seconds - that was ok because it was looking at archived data. Normally, queries shouldn't take more than 1 second. Taking longer usually means you've written the query wrong. Have you looked at the execution plan?


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://641206]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-25 06:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found