Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Global regex giving up too soon

by Wassercrats (Initiate)
on Jan 20, 2004 at 10:48 UTC ( [id://322555]=perlquestion: print w/replies, xml ) Need Help??

Wassercrats has asked for the wisdom of the Perl Monks concerning the following question:

I found a weird webpage that has link targets that include quotes in the fragment part (after the #), such as <a href="/#A "dumb-ass" fragment">. I decided to try to handle links like that for my site mapper. Basically, I'm considering everything in an anchor tag from the first quote to the last to be the target when there is only one equal sign. That condition isn't shown below. Below is the regex I've been using to delete the quotes within the fragment portion of a target. I've tried it with and without the c modifier. Why does it only delete the quote before dumb-ass when used without the "for" loop? It works perfectly, removing both inner quotes, when I include the loop, but as a stand alone regex, even with the g modifier, only one quote gets deleted. How should I be doing this? Maybe I'm just tired, but I have the feeling I don't know regexes as well as I thought I did.

for ($a=0; $a<3; $a++) { $line[$count] =~ s/(href\s*=\s*(["'])[^"'#]*?#[^"']*?)\2(.*\2 +\s*>)/$1$3/isgc; }

Replies are listed 'Best First'.
Re: Global regex giving up too soon
by Abigail-II (Bishop) on Jan 20, 2004 at 10:57 UTC
    First of all, the /c modifier is meaningless with s///. Had you turned on warnings, Perl would have told you so. Furthermore, the regex matches from the "href" to the final ">", and you replace everything, but the second quote. For the given example string, there's only one "href", so the /g is pointless as well.

    I can't figure out what your intention is, so I'm not offering a working regex.

    Abigail

      For the given example string, there's only one "href", so the /g is pointless as well.

      I figured that was my problem. That's why I tried using /c, but I couldn't find a good description of how it worked. So basically, I guess there is no way to get /g to look over the entire regex, so I'll have to use while $& or establish that there's an href earlier so I don't have to use it in the regex.

      Anyway, thanks for not telling me to use a module!

        So basically, I guess there is no way to get /g to look over the entire regex,
        What do you mean by "looking over the entire regex"? /g means "match repeatedly (without overlap)".

        Abigail

Re: Global regex giving up too soon
by Wassercrats (Initiate) on Jan 20, 2004 at 19:50 UTC
    This is what confused me--from http://perldoc.com/perl5.8.0/pod/perlretut.html
    A failed match or changing the target string resets the position. If you don't want the position reset after failure to match, add the //c, as in /regexp/gc. The current position in the string is associated with the string, not the regexp. This means that different strings have different positions and their respective positions can be set or read independently.
    I interpreted the third sentence to mean the opposite of what it meant.

    There doesn't seem to be anything on /c or /gc in perlre at all, except for a "see also" reference to perlretut. Perlre is the only Reference Manual page to cover regexes, and I think it should at least mention /c and not simply refer people to a Tutorials page.

      I think you are suffering a little confusion. perlre documents everything to do with compiling or matching regexes, including all the applicable flags (i, m, s, x). All the other flags have nothing to do with compiling or matching a regex; instead they affect the operator that is using the regex, and are documented in perlop, which says:
      In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see perlfunc/pos. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the /c modifier (e.g. m//gc). Modifying the target string also resets the search position.
      Note that //c is only documented to work with m//g, and even then only in scalar context (though it actually works even in list context).

      Update: perlre says it better than I did. Almost at the very top:

      For reference on how regular expressions are used in matching operations, plus various examples of the same, see discussions of m//, s///, qr// and ?? in perlop/"Regexp Quote-Like Operators".

      Matching operations can have various modifiers. Modifiers that relate to the interpretation of the regular expression inside are listed below. Modifiers that alter the way a regular expression is used by Perl are detailed in perlop/"Regexp Quote-Like Operators" and perlop/"Gory details of parsing quoted constructs".

        I was looking for /g related stuff. If /c isn't explained in perlre near the /g information, then a referral to /c information should be given. I don't read any document in its entirety when I look this stuff up, especially long technical ones, and I like indexing and references that cater to the lazy. Whoever wrote perlretut had the right idea, though the explanation wasn't idiot-proof enough for me.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://322555]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-03-29 05:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found