Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Wrapping and modifying web pages

by Sixtease (Friar)
on Nov 23, 2007 at 16:53 UTC ( #652610=perlquestion: print w/replies, xml ) Need Help??

Sixtease has asked for the wisdom of the Perl Monks concerning the following question:

Friends,

I'm developing a web application that takes an URL, fetches the document and adds some tags to the html. Since there are sites that do similar stuff (www.pornolize.com), I thought perhaps that this task is common enough that a module could exist for it. Changing all relative links and who knows what else will have to be done for the site to still work. Do you know whether there is such a tool?

Replies are listed 'Best First'.
Re: Wrapping and modifying web pages
by fenLisesi (Priest) on Nov 23, 2007 at 17:53 UTC
    I think WWW::Mechanize and HTML::TreeBuilder (or their equivalents) are what you want. Anything beyond those would be too specific to be of value as modules.
Re: Wrapping and modifying web pages
by NetWallah (Canon) on Nov 23, 2007 at 20:26 UTC
    Thanks (++) for that hilarious link to pornolize. I can't understand why people appear to be --ing your node (and are likely to do that to THIS node as well) - Lighten up, people ! The OP's question is legitimate. You may think, perhaps it needs a Warning notice that the link may not be suitable for underage, or easily offended people, but with a name like PORNOLIZE, a reasonable person should realize its potential for offense to the uber-sensitive. Personally, I was ROFLMAO!

    Anyway I think a better candidate to do what you want is HTML::Parser.

         "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom

Re: Wrapping and modifying web pages
by Sixtease (Friar) on Nov 23, 2007 at 23:01 UTC

    Thank you for the tips. I was looking for something that would do the needed transformations of the webpage to preserve its functionality. It seems I'll have to write that myself (likely using the tools you suggested). I'm a little surprised by fenLisesi's comment that anything more specific than WWW::Mechanize and HTML::TreeBuilder would not be of value for a module. Do you people think that a tool for transforming webpages to work with altered URL would not make for a good and worthy CPAN module?

    And about the pornolize thing... I never realized it might be inaproppriate to post it. Sorry if I offended anybody, I found it hilarious and a good example of what I meant.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://652610]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2019-10-13 22:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?