Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

(Dermot) Re: Big

by Dermot (Scribe)
on Sep 28, 2000 at 02:49 UTC ( #34292=note: print w/replies, xml ) Need Help??

in reply to Big, bad, ugly regex problem

A couple of comments. Use a HTML Parser if at all possible. Your brane will thank you in the long run. I think someone has already suggested that. Apart from that though so far I can only see one thing wrong with the regex. You specify an optional open quote with \3 and an optional close quote matching \3 but in between you use \3 with a negative lookahead and for the times when there isn't a \3 bad stuff will happen. I'm not sure really what will happen. IIRC these catch variables are guaranteed to be undefined when you start a new match or substitution.

Replies are listed 'Best First'.
(Dermot) RE: Re: Big
by Dermot (Scribe) on Sep 28, 2000 at 04:36 UTC
    Ovid, this is what I came up with after messing with it for a while. It handles the two input strings you were having problems with (quote characters are not optional) but I've no idea if it will work for all possible data. The quote characters optional we can talk about tomorrow.
    #!/usr/bin/perl -w use strict; my ($data, $res); $data = '<a & href="somesite.html">test<\a>'; print "Before substitution: $data\n"; $res = $data =~ s/ ( # Capture to $1 <a and <a\s # a space character ) (?: # Non-capturing parens [^>]* # stuff between a and href ) ( href\s* # href followed by spaces ) ( =\s* # Equals followed by spaces ( ["']+ # Open quote character ) ( [^"']+ # Non open quote character ) (?: \4 # Close quote character ) ) ( > # Not final close angle bracket ) ( [^>]+ # Up to closing angle bracket > # Final close angle bracket ) /$1$2$3$6$7/x; print "no match\n" if ($res eq ""); print "After substitution: $data\n";
(Dermot) & #61 not matching because of # character
by Dermot (Scribe) on Sep 28, 2000 at 20:56 UTC
    In an /x modified regex the # character is the comment character. The & #61 which represents a space character isn't matching. Instead the & is matching and the #61 to end of line is seen as a comment. Subsequently $3 doesn't match at all due to its optionality and the fact that [^;]+ is greedy.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://34292]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2020-10-23 00:37 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (234 votes). Check out past polls.