Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Machine learning pattern matching...

by LanX (Canon)
on Dec 26, 2012 at 17:14 UTC ( #1010409=note: print w/ replies, xml ) Need Help??


in reply to Machine learning pattern matching...

> What I'm thinking of is along the lines of an algorithm that looks for repetition in the HTML structure of the page, and then examines them for the relevant data - could be table rows, divs, paragraphs, lists - trying to be as generic as possible...

Sounds for me like a combination of web mining and cluster analysis! (?)

I doubt that you can find any ready to use modules combining both¹, cause this is a core technology for some big players in web business.

Cheers Rolf

¹) Especially as generic as you asked


Comment on Re: Machine learning pattern matching...
Replies are listed 'Best First'.
Re^2: Machine learning pattern matching...
by cLive ;-) (Parson) on Dec 31, 2012 at 16:42 UTC

    Not looking for a full solution, but mainly for ideas on what I should be reading up on to build it myself.

    This idea's been floating around in my brain for a while, so I'm giving it some room to see if it grows :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1010409]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2015-07-31 23:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (282 votes), past polls