http://www.perlmonks.org?node_id=1010408


in reply to Machine learning pattern matching...

So you want the web-page you want to scrape to act as some kind of configuration file to define what content you want to retain. I doubt it that anyone already wrote such a program. I think it is a few levels above the state-of-the-art of AI technology.

But perhaps you are thinking of something more specific: real estate listings, catalogues, ...

If you can narrow down the scope of your research, there may be some hope yet.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics
  • Comment on Re: Machine learning pattern matching...

Replies are listed 'Best First'.
Re^2: Machine learning pattern matching...
by cLive ;-) (Prior) on Dec 31, 2012 at 16:39 UTC

    Yes, it's going to be user suggested search results from shopping sites (honoring any robots.txt restrictions, obviously).

    Point is, I won't know what they're going to suggest until they do and, ideally, I'd like to automate additions where possible to minimize manual review.

    I was thinking of grabbing any possible matchces on the page and present them to the user adding the link as first step, but wondered what was out there already. Short of looking for patterns in the DOM, I'm not sure what else to do.