Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Substitution Problem

by rtlm (Novice)
on Jul 10, 2004 at 06:33 UTC ( [id://373317]=perlquestion: print w/replies, xml ) Need Help??

rtlm has asked for the wisdom of the Perl Monks concerning the following question:

Gday Monks, I am trying to do a simple substitution in my website. I need to replace everything between 2 form tags (including the tags themselves). This is the code i am trying but it does not seem to work... s{<FORM(.*?)/FORM>}{replacement text}; The form HTML is split up across multiple lines and there are some tabs in there as well. I think these spaces and tabs are causing the problems because the above statement works if the form tags and everything between them are on one line. It would be fine except the form is too big to be practically put on one line. Any suggestions? Thanks. Cheers for all the replies guys, all sorted.

Replies are listed 'Best First'.
Re: Substitution Problem
by meraxes (Friar) on Jul 10, 2004 at 06:54 UTC

    The '.' metacharacter doesn't match newlines by default. You'll need to add the s pattern modifier to make it do that. You may want to add the i modifier as well to make it case-insensitive if you don't know that the HTML tags are all uppercase:

    s{<FORM(.*?)/FORM>}{replacement text}is;

    It may also be worth noting that if the HTML is not well formed you could end up removing a heck of a lot more than you intended using this regexp

    Update: Whoops. Quite right davido. I assumed that everything was in a single scalar variable. Additionally, for a quickie list of regexp modifiers you can go to perlreref.

        Ummmmm... no... perlreref is the regex quick ref. If it were a typo then th link wouldn't have worked. ;)

        Update: Um... still no. perlreref (perl regex reference faq), perlre (perl regex faq) and perlref (perl references and nested datastructures faq) are quite distinct.

Re: Substitution Problem
by wfsp (Abbot) on Jul 10, 2004 at 09:30 UTC
    I agree. This uses HTML::TokeParser. I have found that it is easily adaptable to do any chore you may have parsing html. Since I've started using it I've never used a regex on html. It's never worth the effort.
    #!/bin/perl5 use strict; use warnings; use HTML::TokeParser; open HTML_FILE, 'form.html' or die; my $tp = HTML::TokeParser->new( \*HTML_FILE ) or die; my $html; my $found_form = 0; while ( my $t = $tp->get_token ) { $found_form++, next if $t->[0] eq 'S' and $t->[1] eq 'form'; $found_form--, next if $t->[0] eq 'E' and $t->[1] eq 'form'; next if $found_form; $html .= $t->[4] if $t->[0] eq 'S'; $html .= $t->[1] if $t->[0] eq 'T' or $t->[0] eq 'C'; $html .= $t->[2] if $t->[0] eq 'E'; } close HTML_FILE; print "$html\n"; # ["S", $t, $attr, $attrseq, $text] # ["E", $t, $text] # ["T", $text, $is_data] # ["C", $text] # ["D", $text] # ["PI", $token0, $text]
Re: Substitution Problem
by davido (Cardinal) on Jul 10, 2004 at 06:58 UTC
    You didn't mention how you're reading in the document. It may be necessary, in additon to using the /s modifier on your substitution, to also slurp the entire file at once. Otherwise, you'll probably just be reading it one line at a time, and that could foul up your matching.


Re: Substitution Problem
by beable (Friar) on Jul 10, 2004 at 08:43 UTC
    You should really consider using a module like HTML::Parser if you can. It is very difficult to write a regex which matches arbitrary HTML. Consider the Perl Faq entry How do I remove HTML from a string?.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://373317]
Approved by sgifford
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-05-30 18:32 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.