|Problems? Is your data what you think it is?|
Re: RFC: Presentation on Machine Learning with Perlby toma (Vicar)
|on Jul 10, 2007 at 08:07 UTC||Need Help??|
This sounds like an interesting talk and I hope to see it. It will be a great talk if I leave excited about what I can do with the tools, how I can get started, how to scale up from my starting point, and what to do if I get stuck.
Thanks for this opportunity to make requests about your talk. Here are some ideas, both general and specific. Please feel free to use or ignore any of them.
When I see a talk like this, I want to learn things that will help me get started more quickly if I decide to do something similar. I like to learn the approach for a minimal working example (like the synopsis). Then I am ready to hear a story about what you did to scale it up to a solve a real-world problem. Typically I wouldn't really care about your exact example problem, I just want to explore the boundaries. How well does it scale?
I want inside information that isn't in the documentation. For example, when you have a problem, how do you get good help? Is the code problematic on some platforms? What tools, libraries and skills are needed to make the system work? Do APIs break at each release, or are they stable?
For example I think that PGPlot is great and I use it, but it can be hard to build and get working. If you really need the features, use it, but if you don't, there are much easier alternatives. It has the classic build problem of a large number of options, many dependencies, and I haven't figured out how it is simple to use on simple problems. It would be of great value to me if you could explain an easy way to build and use PGPlot on Windows, Linux and OSX ;-). This is the kind of inside information that makes attending the conference a good investment.
Perl is great for gathering huge amounts of data. The challenge quickly becomes solving problems with the dataset, for example those caused by network and server outages or other annoyances.
It would be good to hear about SVM and what sort of problems it is good for. For example, how much data do you need? How much more data is needed for each new feature? What if your data isn't perfectly clean? What types of data are usually used? When would I want to use Perl with LIBSVM instead of R or Matlab?
Thanks - I hope to attend your talk.
It should work perfectly the first time! - toma