Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: Scrape a blog: a statistical approach

by soonix (Canon)
on Apr 13, 2014 at 22:18 UTC ( [id://1082167]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Scrape a blog: a statistical approach
in thread Scrape a blog: a statistical approach

You could take a diff between consecutive pages instead of counting lines. You'd have to experiment with different modules like e.g. HTML::Diff or Text::Diff, but this approach could also help with style/layout changes.
  • Comment on Re^3: Scrape a blog: a statistical approach

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1082167]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-19 09:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found