Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Machine learning pattern matching...

by CountZero (Bishop)
on Dec 26, 2012 at 16:41 UTC ( #1010408=note: print w/ replies, xml ) Need Help??


in reply to Machine learning pattern matching...

So you want the web-page you want to scrape to act as some kind of configuration file to define what content you want to retain. I doubt it that anyone already wrote such a program. I think it is a few levels above the state-of-the-art of AI technology.

But perhaps you are thinking of something more specific: real estate listings, catalogues, ...

If you can narrow down the scope of your research, there may be some hope yet.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics


Comment on Re: Machine learning pattern matching...
Re^2: Machine learning pattern matching...
by cLive ;-) (Parson) on Dec 31, 2012 at 16:39 UTC

    Yes, it's going to be user suggested search results from shopping sites (honoring any robots.txt restrictions, obviously).

    Point is, I won't know what they're going to suggest until they do and, ideally, I'd like to automate additions where possible to minimize manual review.

    I was thinking of grabbing any possible matchces on the page and present them to the user adding the link as first step, but wondered what was out there already. Short of looking for patterns in the DOM, I'm not sure what else to do.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1010408]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (13)
As of 2014-12-18 16:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (58 votes), past polls