I work in machine learning and use Perl for most of my scripting, but have never bothered to use CPAN's machine learning modules. First, you often need to do some additional linear algebra on your data (e.g. centering, finding eigenvalues, SVD, etc.), and these modules don't share a common matrix representation. The lack of a common format for compact storage and a rich library of numerical algorithms makes it hard to do things quickly in pure Perl. Second, many CPAN modules I've looked at seem to have been written either for their authors' edification or without caring about large datasets (e.g. Algorithm::SVMLight
requires you to add your datapoints one at a time in bulky hash-refs), while most of the problems I care about involve huge amounts of data.
I think the PDL statistics paper someone else mentioned is the best "perl for statistics" resource I've seen. Depending on your problems and level of familiarity with the field, there may be some articles on Perl.com of interest. As much as I loathe Java, I would actually recommend Weka as an implementation of lots of machine learning algorithms that work well together. But unless PDL does what you want, I'd suggest something other than Perl (including CPAN modules) for your core algorithms.