Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: ETL in Perl

by Corion (Patriarch)
on Sep 03, 2010 at 16:35 UTC ( [id://858746]=note: print w/replies, xml ) Need Help??


in reply to ETL in Perl

If you've enjoyed doing ETL using a graphical dataflow environment, you haven't done large or complex data transformations. In my experience, these data graphs always become too large to print and document. They are also hard to diff against each other.

The Perl module coming closest to ETL is Workflow, and it wouldn't be too hard to create the appropriate "transformation" nodes to it. You only need to render it to SVG, and allow the user to modify it. Writing such a node editor shouldn't be too hard either, as there is existing knowledge embodied in Inkscape.

Replies are listed 'Best First'.
Re^2: ETL in Perl
by metaperl (Curate) on Sep 03, 2010 at 18:43 UTC
Re^2: ETL in Perl
by metaperl (Curate) on Sep 03, 2010 at 17:11 UTC
    If you've enjoyed doing ETL using a graphical dataflow environment, you haven't done large or complex data transformations. In my experience, these data graphs always become too large to print and document. They
    • I have greatly enjoyed doing ETL using Ab Initio for professional work
    • "large or complex" is qualitative. If things became large or complex, I was able to highlight a section of processes and tuck them into a box, so printing was not an issue. And I found the large graphs to be largely self-documenting.
    • What dataflow tools does your experience encompass?
    Workflow is interesting but I dont see it offering what I got out of Ab Initio and what I listed as points for graphical dataflow -- parallelism, big sky picture, etc.



    The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development.

    -- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"

      I have worked with Aris Toolset and IBM SPSS PASV Modeler (previously SPSS PASV Modeler, née SPSS Clementine). I don't know what your requirements for documentation are, but I found that these tools hid far too much information within the nodes and did not print it out. And in the end, processes need to be documented in print. "Tucking things into boxes" only means that instead of one huge A2 or A0 sheet, you get to deal with binders of A4 sheets, which is no improvement.

      If you want a graphical ETL tool, I pointed you to the tools that you need to stick together to create such a thing with Perl as the backend. If you want to play buzzword bingo and tell us that you're now getting paid for playing with ETL tools, maybe you want to visit http://etlmonks.org instead?

        If you want to play buzzword bingo and tell us that you're now getting paid for playing with ETL tools, maybe you want to visit http://etlmonks.org instead?

        No need to get snippy.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://858746]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-04-24 01:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found