Re: Editing HTML files

by Tanktalus (Canon)
on Jul 08, 2008 at 21:15 UTC

in reply to Editing HTML files

From a quick perusal of HTML::Manipulator, it seems that it presupposes you have some HTML in a certain format: either with id tags (by definition these are unique) or comment markers. If you have that, great. If not, and you can't get there, then you'll have to look for another way.

Generally, I can get away with making all my HTML actually XHTML-compliant. Which then means I can whip out my favourite swiss-army knife: XML::Twig. If your HTML already is XHTML, then this becomes really easy - the complicated part will be coming up with the XPath, but that shouldn't be too hard ... something like //div[string()="nothing"] is my guess. The get_xpath function will return all the tag objects at once, and you can just loop through, change the text for each one (set_text), and print it all back out.

Otherwise, you'll probably have to roll your own with HTML::Parser...

Re^2: Editing HTML files
by spivey49 (Monk) on Jul 08, 2008 at 21:33 UTC
    Thanks Tanktalus. The problem with XHTML is the files are generated as HTML and get overwritten by another process. The problem with reg expressions is the tag values are unpredictable. I saw the same issue when glancing over the HTML:Manipulator docs. I was hoping I missed something or someone might have seen this tag format before. I'll give HTML::Parser a shot.
      The problem with reg expressions is the tag values are unpredictable.
      Well, in that case a proper parser is a better approach.

      If the tag values are unpredictable, use metacharacters. Then again, depending on the html files your editing, it may not work

