http://www.perlmonks.org?node_id=818393

PerlRob has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm trying to wrap div tags around anchor tags in an HTML document. I know the exact HTML that needs to be wrapped, but for some reason I'm having a hard time writing a script that can automate this.

Here's an example. I need to replace the first line with the second:

<a class="main-button" name="HrUpload" href="javascript:doUploadAtachm +ent();" within="theForm"><span>Attach</span></a> <div class="main-button-wrapper"><a class="main-button" name="HrUpload +" href="javascript:doUploadAtachment();" within="theForm"><span>Attac +h</span></a></div>
How would you do this?

Replies are listed 'Best First'.
Re: Search and replace HTML
by wfsp (Abbot) on Jan 20, 2010 at 14:07 UTC
    #!/usr/bin/perl use warnings; use strict; use HTML::TreeBuilder; my $t = HTML::TreeBuilder->new_from_file(*DATA) or die qq{cant build tree: $!\n}; my $anchor = $t->look_down( _tag => q{a}, class => q{main-button}, ); $anchor->replace_with( [ q{div}, {class => qq{main-button-wrapper}}, $anchor, ], ); print $t->as_HTML(undef, q{ }); __DATA__ <html> <head> <title>search and replace</title> </head> <body> <a class="main-button" name="HrUpload" href="javascript:doUploadAtac +hment();" within="theForm"> <span>Attach</span> </a> </body> </html>
    <html> <head> <title>search and replace</title> </head> <body> <div class="main-button-wrapper"><a class="main-button" href="javasc +ript:doUploadAtachment();" name="HrUpload" within="theForm"> <span>At +tach</span> </a></div> </body> </html>
Re: Search and replace HTML
by Ratazong (Monsignor) on Jan 20, 2010 at 08:40 UTC

    Hi!

    What is your issue with the script?

    • reading the html-file line-by-line?
    • searching and replacing the tags?

    In the first case, you'll need some code like

    use FileHandle; my $htmlfile = new FileHandle("XXX.html", "r") ; while ($line = <$htmlfile>) { # process each line
    In the second case, you need to identify what to replace and by which new text. According to your example, this could be
    <a class= ----> <div class="main-button-wrapper"><a class= or <\a> ----> <\a><\div>
    Then you'll need to create the perl-regExes for replacing ... using =~ s/// and escaping all special characters. One line could be
    $line =~ s/<\/a>/<\/a><\/div>/;

    HTH, Rata

    PS.: be aware that defining suitable patterns for your anchor-tags is crucial, as the solution above is just a text-replace, but is not working on the HTML-structure. If your HTML-pages contain your anchors at "unexpected" places, you'll probably be much happier by using a CPAN-module for HTML-parsing

Re: Search and replace HTML
by leocharre (Priest) on Jan 20, 2010 at 13:51 UTC

    I'd look for a cpan module to parse html, one that will give me a list of a links. Consider HTML::LinkExtor, (subclass of HTML::Parser, therefore has all its methods).

    Then I would iterate through that list and find the ones that match my criteria.

    That's how I would *find* what I am changing. Regexing into html is hard.

Re: Search and replace HTML
by Anonymous Monk on Jan 20, 2010 at 08:34 UTC