Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Scrape a blog: a statistical approach

by epimenidecretese (Acolyte)
on Apr 15, 2014 at 14:10 UTC ( [id://1082345]=note: print w/replies, xml ) Need Help??


in reply to Scrape a blog: a statistical approach

Ok, I did some further researches and I've found that this stuff is too complicated to be solved "in a few lines of code".

For those who will have to handle the same problem I post the following link which contains up to date informations and useful libraries. Now I using justext within python. There is a NCleaner perl module but I've not been able to use it.

As always, thanks guys for your support.

https://sites.google.com/a/morganclaypool.com/wcc/home/software

  • Comment on Re: Scrape a blog: a statistical approach

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1082345]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-26 00:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found