http://www.perlmonks.org?node_id=638705


in reply to RFC: Machine Learning Development with Perl

This looks like a nice introductory talk for programmers who want to dip their toes into business intelligence and data mining techniques using Perl. Thanks for sharing it with us.

Coming from a machine learning background, I have a few comments that might be useful to add to your presentation.

First, developing a machine learning approach to a problem doesn't really stop at the implementation phase. Once a model is fit to the data, you need to test how well the model fits. To do that, you not only need to test how well the model fit compares to the correct answers, you need to test the model's generalizability. Here, generalizability means 'How well does a this model perform on new data, .i.e, what is its predictive power?" Typically, cross validation, bootstrap or Bayesian methods are used to test predictive power. I have seen many machine learning implementations fail miserably because the programmers didn't realize that testing model fits on new data was also needed.

Second, you give a nice exposition on clustering, but regression problems are nearly as important in the machine learning literature. Regression differs from classification in that models are created to predict "how much?" rather than "what class?". Giving a separate regression example would probably be too much for an introductory lecture, but mentioning that there are also machine learning approaches to regression problems would be useful.

Third, PDL is a wonderful tool for numerical computation, but it is also a language within a language. If you have time, it would help new users to explain the few PDL constructs you use in your modeling program. Otherwise the operator overloading will just be confusing.

-Mark