Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Search and replace HTML

by PerlRob (Sexton)
on Jan 20, 2010 at 08:07 UTC ( #818393=perlquestion: print w/ replies, xml ) Need Help??
PerlRob has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm trying to wrap div tags around anchor tags in an HTML document. I know the exact HTML that needs to be wrapped, but for some reason I'm having a hard time writing a script that can automate this.

Here's an example. I need to replace the first line with the second:

<a class="main-button" name="HrUpload" href="javascript:doUploadAtachm +ent();" within="theForm"><span>Attach</span></a> <div class="main-button-wrapper"><a class="main-button" name="HrUpload +" href="javascript:doUploadAtachment();" within="theForm"><span>Attac +h</span></a></div>
How would you do this?

Comment on Search and replace HTML
Download Code
Replies are listed 'Best First'.
Re: Search and replace HTML
by wfsp (Abbot) on Jan 20, 2010 at 14:07 UTC
    #!/usr/bin/perl use warnings; use strict; use HTML::TreeBuilder; my $t = HTML::TreeBuilder->new_from_file(*DATA) or die qq{cant build tree: $!\n}; my $anchor = $t->look_down( _tag => q{a}, class => q{main-button}, ); $anchor->replace_with( [ q{div}, {class => qq{main-button-wrapper}}, $anchor, ], ); print $t->as_HTML(undef, q{ }); __DATA__ <html> <head> <title>search and replace</title> </head> <body> <a class="main-button" name="HrUpload" href="javascript:doUploadAtac +hment();" within="theForm"> <span>Attach</span> </a> </body> </html>
    <html> <head> <title>search and replace</title> </head> <body> <div class="main-button-wrapper"><a class="main-button" href="javasc +ript:doUploadAtachment();" name="HrUpload" within="theForm"> <span>At +tach</span> </a></div> </body> </html>
Re: Search and replace HTML
by Ratazong (Prior) on Jan 20, 2010 at 08:40 UTC

    Hi!

    What is your issue with the script?

    • reading the html-file line-by-line?
    • searching and replacing the tags?

    In the first case, you'll need some code like

    use FileHandle; my $htmlfile = new FileHandle("XXX.html", "r") ; while ($line = <$htmlfile>) { # process each line
    In the second case, you need to identify what to replace and by which new text. According to your example, this could be
    <a class= ----> <div class="main-button-wrapper"><a class= or <\a> ----> <\a><\div>
    Then you'll need to create the perl-regExes for replacing ... using =~ s/// and escaping all special characters. One line could be
    $line =~ s/<\/a>/<\/a><\/div>/;

    HTH, Rata

    PS.: be aware that defining suitable patterns for your anchor-tags is crucial, as the solution above is just a text-replace, but is not working on the HTML-structure. If your HTML-pages contain your anchors at "unexpected" places, you'll probably be much happier by using a CPAN-module for HTML-parsing

Re: Search and replace HTML
by leocharre (Priest) on Jan 20, 2010 at 13:51 UTC

    I'd look for a cpan module to parse html, one that will give me a list of a links. Consider HTML::LinkExtor, (subclass of HTML::Parser, therefore has all its methods).

    Then I would iterate through that list and find the ones that match my criteria.

    That's how I would *find* what I am changing. Regexing into html is hard.

Re: Search and replace HTML
by Anonymous Monk on Jan 20, 2010 at 08:34 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://818393]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (11)
As of 2015-07-29 06:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (260 votes), past polls