punkish has asked for the wisdom of the Perl Monks concerning the following question:
My program grabs web pages, stores two versions of them in a database:
when small people start casting long shadows, it is time to go to bed
- A version without any HTML tags, for which I use HTML::Strip. The text, that is, the non-tags content, of the web page is used to build a full-text index which is used for later searches;
- A version as the page was at the instant of downloading it. This one is used to show the user the web page as it was at the time and date when it was downloaded.
So, I am seeking two kinds of advice -- one, how to strip out only the JavaScript from a web page; and two, how to generally better accomplish the above.
when small people start casting long shadows, it is time to go to bed
Back to
Seekers of Perl Wisdom