Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: ETL in Perl

by erix (Vicar)
on Sep 04, 2010 at 22:28 UTC ( #858893=note: print w/ replies, xml ) Need Help??


in reply to ETL in Perl

ETL [...] typically involves reading large volumes of data from a database [...]

I don't think that's right: it seems to me that ETL first and foremost means (always has): getting data from outside to inside a database. The TRANSFORM is only necessary if the db (-constraint(s)) demand it, and there, obviously, perl can come in handy.

... and processing it in massively parallel fashion

"massively parallel"? That doesn't have much to do with ETL, does it? If the database can slurp data parallel (multicore, or multi-whatever), that's nice, but that doesn't seem very related to any ETL-job.

(btw, your first link 'Extract, transform, and load...' links to the node itself; I suppose that was a mistake? You had wikipedia ETL in mind, perhaps? )


Comment on Re: ETL in Perl
Select or Download Code
Re^2: ETL in Perl
by metaperl (Curate) on Sep 07, 2010 at 13:47 UTC
    "massively parallel"? That doesn't have much to do with ETL, does it? If the database can slurp data parallel (multicore, or multi-whatever), that's nice, but that doesn't seem very related to any ETL-job.
    massively parallel is very important - if you have independant data crunching tasks, the ability to send them off to different heavyweight machines easily without a bunch of fiddling with program source code is a huge book.

    most of my ETL work was for a bank --- analysing database data and creating summaries of it to go right back into the database, so I extracted from a database, analyszed and loaded it back into the db.. but you are right... sometimes the initial source is not a db.

    thanks for the link update



    The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development.

    -- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://858893]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2014-07-31 07:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (245 votes), past polls