Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Data Mining with Perl :: Use the right tool for the job!

by jeroenes (Priest)
on Nov 23, 2001 at 13:29 UTC ( #127075=note: print w/ replies, xml ) Need Help??


in reply to Data Mining with Perl

Be warned. As a neuroscientist, I'm in the data analysis business, more than the mining variant. The enormous data flow and the nature of my experiments generate however cause my analysis to mimick the mining a bit IMHO. With this disclaimer in mind:

Perl indeed is not an analysis tool per se. It is however undismissible in its ability to handle varies formats but also in the development time of your scripts. You will end up using different tools right through each other:

Use the right tool for the job!

This is essential. Always decently think it through before you do something with a certain tool. Can this tool do the job? How much time will I have to spent learning the tool? How much time will I spend coding? How much time will I spend chrunching numbers (or swapping memory space ;-)? Of course don't spend more than appropiate time figuring this out.

It really depends on what you're going to do which tools you want to use. For web grabbing, text manupulations, file manipulations and reporting perl is the tool you need. If you really have to work with matrici of data (so more variables per item or more items per variable than you can handle easily) I seriously would stay away from spreadsheets. They are pretty inflexible when it comes to restating your computations or recalculate your reports/graphs. Believe me, I have started that way. I didn't know how fast I had to turn excel down in favour of turbo pascal. Which is a pale toolkit compared to perl.

Perl has PDL for basic matrix manipulation. If you want to go further, you will either end up with Matlab (www.matlab.com) or S-plus. Both have a very nice computation language, with extensive statistical tools. Moreover, it's really easy to write your own statistics and to plot the results. I'm a matlab user myself, but s-plus is equally fit as far as I have heard.

They both have opensource equivalents, octave and 'R'. I don't know for R, but octave is a decent clone when it comes to basic matlab stuff, but for many toolboxes and for nice graphs you'll have to stick with matlab. On the other hand, someone was posting an Inline::Octave proposal on the inline mailing list. This could be very interesting. When I start 'R' I've got myself a nice window, but I can't tell you anything about its functionality.

While I was writing the 2nd alinea, I got an idea, ran it through a perl/matlab/origin cycle and was pretty excited with the results. (Origin is my favourite graphing program). You see, I use quite some tools in parallel myself.

Feel free to /msg me if you want to know some more details.

HTH,

Jeroen
"We are not alone"(FZ)


Comment on Re: Data Mining with Perl :: Use the right tool for the job!
Re: Re: Data Mining with Perl :: Use the right tool for the job!
by jepri (Parson) on Nov 23, 2001 at 21:23 UTC
    I gotta throw a mention to Scilab in at this point. Another free clone of matlab, to the point where the code you write is almost completely compatible (a quick dose of perl makes it completely compatible). Debian compatible license, and a built in tutorial. Lovely. Runs under X11, etc.

    ____________________
    Jeremy
    I didn't believe in evil until I dated it.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://127075]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2014-07-12 05:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (238 votes), past polls