Wassercrats has asked for the wisdom of the Perl Monks concerning the following question:
I found a weird webpage that has link targets that include quotes in the fragment part (after the #), such as <a href="/#A "dumb-ass" fragment">. I decided to try to handle links like that for my site mapper. Basically, I'm considering everything in an anchor tag from the first quote to the last to be the target when there is only one equal sign. That condition isn't shown below. Below is the regex I've been using to delete the quotes within the fragment portion of a target. I've tried it with and without the c modifier. Why does it only delete the quote before dumb-ass when used without the "for" loop? It works perfectly, removing both inner quotes, when I include the loop, but as a stand alone regex, even with the g modifier, only one quote gets deleted. How should I be doing this? Maybe I'm just tired, but I have the feeling I don't know regexes as well as I thought I did.
for ($a=0; $a<3; $a++)
{
$line[$count] =~ s/(href\s*=\s*(["'])[^"'#]*?#[^"']*?)\2(.*\2
+\s*>)/$1$3/isgc;
}
Re: Global regex giving up too soon
by Abigail-II (Bishop) on Jan 20, 2004 at 10:57 UTC
|
First of all, the /c modifier is meaningless with s///. Had
you turned on warnings, Perl would have told you so.
Furthermore, the regex matches from the "href" to the final
">", and you replace everything, but the second quote.
For the given example string, there's only one "href", so
the /g is pointless as well.
I can't figure out what your intention is, so I'm not offering
a working regex.
Abigail | [reply] [Watch: Dir/Any] |
|
For the given example string, there's only one "href", so the /g is pointless as well.
I figured that was my problem. That's why I tried using /c, but I couldn't find a good description of how it worked. So basically, I guess there is no way to get /g to look over the entire regex, so I'll have to use while $& or establish that there's an href earlier so I don't have to use it in the regex.
Anyway, thanks for not telling me to use a module!
| [reply] [Watch: Dir/Any] |
|
So basically, I guess there is no way to get /g to look over the entire regex,
What do you mean by "looking over the entire regex"?
/g means "match repeatedly (without overlap)".
Abigail
| [reply] [Watch: Dir/Any] [d/l] |
|
|
|
|
thanks for not telling me to use a module
But he did tell you (or rather, nudge you) to use warnings, which you blithely ignored. At some point you should step up to the realization that perl (the interpreter) can help you write better, less buggy code.
thanks for not telling me to use a module
Then I'll do the honors:
Ah, here's an interesting thread: How to use Regular Expressions with HTML
Please read the whole thread, esp. the replies by Ovid.
| [reply] [Watch: Dir/Any] |
Re: Global regex giving up too soon
by Wassercrats (Initiate) on Jan 20, 2004 at 19:50 UTC
|
This is what confused me--from http://perldoc.com/perl5.8.0/pod/perlretut.html
A failed match or changing the target string resets the position. If you don't want the position reset after failure to match, add the //c, as in /regexp/gc. The current position in the string is associated with the string, not the regexp. This means that different strings have different positions and their respective positions can be set or read independently.
I interpreted the third sentence to mean the opposite of what it meant.
There doesn't seem to be anything on /c or /gc in perlre at all, except for a "see also" reference to perlretut. Perlre is the only Reference Manual page to cover regexes, and I think it should at least mention /c and not simply refer people to a Tutorials page. | [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
I was looking for /g related stuff. If /c isn't explained in perlre near the /g information, then a referral to /c information should be given. I don't read any document in its entirety when I look this stuff up, especially long technical ones, and I like indexing and references that cater to the lazy. Whoever wrote perlretut had the right idea, though the explanation wasn't idiot-proof enough for me.
| [reply] [Watch: Dir/Any] |
|
|
|