| [reply] |
Thanks,
i got this to work as expected using "negated" character classes.
My original question still remains as a puzzle. why aren't the results (aka. the values of $&) of the two pattern matches different by the single character 'j' (taken from "joe")? it seems they should be. the zero-width lookahead doesn't seem to be "zero-width" at all.
Regards,
jroberts
| [reply] |
Ok, the problem with [^\z] can be easily seen when you use warnings - which is a good idea in general. Perl complains then about an unidentified escape sequence in the character class. This means that \z is not the end of the string in a character class!! All metacharacters loose their special meaning in a character class.
So [^\z] is equivalent to [^z] which is a single character that is not a 'z'. Looking at your original regex
print $& if /^<a href.*>(?=[^\z])/x;
Consequently the .* in your regex eats up everything till the last '>' and checks if the next character is not a z. Which is true, as it is a newline. Ergo, match found, mystery solved :)
-- Hofmator
| [reply] [d/l] [select] |
Hmm... you've confused me a bit here. You have a typo in the code, and I'm not sure why you have the /x on your second regex. The /x merely means "allow extraneous whitespace and comments."
#!/usr/bin/perl -l
$_ = '<a href= ?a=500011&w=2&r=1 >joe@blow.com</a>';
print $& if /^<a href.*>[^\z]/;
print $& if /^<a href.*>(?=[^\z])/;
This code prints:
<a href= ?a=500011&w=2&r=1 >j
<a href= ?a=500011&w=2&r=1 >
Perhaps you want to use:
/^<a [^>]+>([^<]+)/
That kinda matches a tag, followed by kinda the non-tag stuff after it.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] [select] |
If what you want is to the right of the first > can't you just
print $' or the English $POSTMATCH after the first print statement?
E.g.,
print $& if /^<a href.*>[^\z])/;
print $';
Update: Sorry, overlooked that somehow.
If the code and the comments disagree, then both are probably wrong. -- Norm Schryer
| [reply] [d/l] [select] |
| [reply] [d/l] |
And while we're at it, if you're planning on using this code for an application you need to be robust, I'd recommend using HTML::Parser or a similar module to do the work of extracting information from HTML files. Of course that has little to do with understanding why the regex doesn't work as you thought it would, but it's worth noting.
perl -e 'print "How sweet does a rose smell? "; chomp ($n = <STDIN>);
+$rose = "smells sweet to degree $n"; *other_name = *rose; print "$oth
+er_name\n"'
| [reply] [d/l] |
to get everything in between...
/>(.*?)<\/a>/;
print $1;
the first one doesn't get everything as you expect because the [^\z] matches one non-EOL char so...
- .* gobbles everything to EOL
- the regex backtracks to match the > (last char on line)
- regex attempts to match a non-EOL char - FAIL - no chars left in string
- regex backtracks to match the previous >
- regex matches a non-EOL char "j" - COMPLETE
the second one just _looks_ for a non-EOL after the match but is zero-width so works as you expect.
hope this helps
larryk
perl -le "s,,reverse killer,e,y,rifle,lycra,,print" | [reply] [d/l] [select] |