Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Scrape a blog: a statistical approach

by epimenidecretese (Acolyte)
on Apr 15, 2014 at 14:10 UTC ( #1082345=note: print w/replies, xml ) Need Help??


in reply to Scrape a blog: a statistical approach

Ok, I did some further researches and I've found that this stuff is too complicated to be solved "in a few lines of code".

For those who will have to handle the same problem I post the following link which contains up to date informations and useful libraries. Now I using justext within python. There is a NCleaner perl module but I've not been able to use it.

As always, thanks guys for your support.

https://sites.google.com/a/morganclaypool.com/wcc/home/software

  • Comment on Re: Scrape a blog: a statistical approach

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1082345]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2019-11-20 04:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (96 votes). Check out past polls.

    Notices?