Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Search and replace HTML

by PerlRob (Sexton)
on Jan 20, 2010 at 08:07 UTC ( #818393=perlquestion: print w/replies, xml ) Need Help??
PerlRob has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm trying to wrap div tags around anchor tags in an HTML document. I know the exact HTML that needs to be wrapped, but for some reason I'm having a hard time writing a script that can automate this.

Here's an example. I need to replace the first line with the second:

<a class="main-button" name="HrUpload" href="javascript:doUploadAtachm +ent();" within="theForm"><span>Attach</span></a> <div class="main-button-wrapper"><a class="main-button" name="HrUpload +" href="javascript:doUploadAtachment();" within="theForm"><span>Attac +h</span></a></div>
How would you do this?

Replies are listed 'Best First'.
Re: Search and replace HTML
by wfsp (Abbot) on Jan 20, 2010 at 14:07 UTC
    #!/usr/bin/perl use warnings; use strict; use HTML::TreeBuilder; my $t = HTML::TreeBuilder->new_from_file(*DATA) or die qq{cant build tree: $!\n}; my $anchor = $t->look_down( _tag => q{a}, class => q{main-button}, ); $anchor->replace_with( [ q{div}, {class => qq{main-button-wrapper}}, $anchor, ], ); print $t->as_HTML(undef, q{ }); __DATA__ <html> <head> <title>search and replace</title> </head> <body> <a class="main-button" name="HrUpload" href="javascript:doUploadAtac +hment();" within="theForm"> <span>Attach</span> </a> </body> </html>
    <html> <head> <title>search and replace</title> </head> <body> <div class="main-button-wrapper"><a class="main-button" href="javasc +ript:doUploadAtachment();" name="HrUpload" within="theForm"> <span>At +tach</span> </a></div> </body> </html>
Re: Search and replace HTML
by Ratazong (Monsignor) on Jan 20, 2010 at 08:40 UTC


    What is your issue with the script?

    • reading the html-file line-by-line?
    • searching and replacing the tags?

    In the first case, you'll need some code like

    use FileHandle; my $htmlfile = new FileHandle("XXX.html", "r") ; while ($line = <$htmlfile>) { # process each line
    In the second case, you need to identify what to replace and by which new text. According to your example, this could be
    <a class= ----> <div class="main-button-wrapper"><a class= or <\a> ----> <\a><\div>
    Then you'll need to create the perl-regExes for replacing ... using =~ s/// and escaping all special characters. One line could be
    $line =~ s/<\/a>/<\/a><\/div>/;

    HTH, Rata

    PS.: be aware that defining suitable patterns for your anchor-tags is crucial, as the solution above is just a text-replace, but is not working on the HTML-structure. If your HTML-pages contain your anchors at "unexpected" places, you'll probably be much happier by using a CPAN-module for HTML-parsing

Re: Search and replace HTML
by leocharre (Priest) on Jan 20, 2010 at 13:51 UTC

    I'd look for a cpan module to parse html, one that will give me a list of a links. Consider HTML::LinkExtor, (subclass of HTML::Parser, therefore has all its methods).

    Then I would iterate through that list and find the ones that match my criteria.

    That's how I would *find* what I am changing. Regexing into html is hard.

Re: Search and replace HTML
by Anonymous Monk on Jan 20, 2010 at 08:34 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://818393]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (13)
As of 2017-11-20 10:34 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (286 votes). Check out past polls.