Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

HTML tag search/replace.

by abultm74 (Initiate)
on Apr 20, 2001 at 06:51 UTC ( #74059=perlquestion: print w/replies, xml ) Need Help??

abultm74 has asked for the wisdom of the Perl Monks concerning the following question:

Help! I'm working on a class project. We are caching copies of HTML files. The problem: HTML 'href's and 'src's need to be changed. Relative links need to be changed to hard links, so that all our database has to hold is HTML text, not images, etc. Anyway, there is a myriad of ways of making HTML href and img tags: No quotes, quotes, relative, relative with '..'s, leading slashes, trailing slashes, ones with 'http://', with only 'www', etc. I need to find all HTML 'href' and 'src' links and make them hard links. Any ideas? Is there a module that does this, or do I have to do a million regexps? I need some help... 'Mad Props' to anyone who can shed some light... Adam

Replies are listed 'Best First'.
Re: HTML tag search/replace.
by Maclir (Curate) on Apr 20, 2001 at 07:11 UTC

    HTML::Parser is your friend. A subset - HTML::TokeParser - may be sufficient. You are correct, Adam, when you identify the source of the difficulty - they there are a variety of ways to code those HTML tags. (as a side issue - XHTML with a much more rigid syntax will make this easier - sort of like "use strict;" for HTML.)

    Now, if this is part of a class project, maybe they are wanting to see how you would tackle the problem, as an exercise in analysis and program design. At least HTML::Parser should be a good source of inspiration.

          
      XHTML ... sort of like "use strict;" for HTML

      You know this is brilliant! It makes perfect sense both for Perl coders trying to understand XML and XHTML and for XMLers learning Perl.

      Saddly it still allows for both ' and " to be used around attributes and has no rule concerning URL's, so they will still be just as hard to parse ;--(

Re: HTML tag search/replace.
by merlyn (Sage) on Apr 20, 2001 at 16:25 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://74059]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2022-06-27 08:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (87 votes). Check out past polls.

    Notices?