david2008 has asked for the wisdom of the Perl Monks concerning the following question:
Hi all,
I want to write an application which makes web crawling on a certain page and all his children (just 1 level).
I have the following requirements:
- javascript handling. For example there are links which run javascript code which opens a new window and i want to parse this page.
- pdf, word and ppt parsing
- authentication by cookies. There are pages where first you have to login and then you are authenticated by the cookie in all other clicks
Do you know such a cpan module which can provide this
functionality?
I saw in google that part of these questions were asked in the past, but i want to know if there is a module which have all this features combined.
Thanks, David
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: web crawler infrastructure
by marto (Cardinal) on Jan 07, 2013 at 09:57 UTC | |
Re: web crawler infrastructure
by Anonymous Monk on Jan 07, 2013 at 09:55 UTC | |
Re: web crawler infrastructure
by space_monk (Chaplain) on Jan 07, 2013 at 11:32 UTC |
Back to
Seekers of Perl Wisdom