Re: regular expression
by choroba (Cardinal) on Aug 03, 2012 at 14:51 UTC
|
/(.*)(\1.*)/
The desired string is stored in $2.
| [reply] [Watch: Dir/Any] [d/l] |
|
While this will work for this string, we're obviously dealing with a perl novice here and this solution is difficult to understand at best for someone learning the language.
If you're ever uncertain as to why your regex is not working, put a capture '()' around each element of the regex and dump them out. You'll see exactly which regex elements are matching which pieces of your string. This will give you a lot of insight into what might be going wrong. Also be sure that you check the return value of your match is true. You might not be matching at all.
As pointed out by many previous posters, I recommend the OP research greedy and non-greedy matches. Also read up on anchoring regexes with '^' and '$'.
| [reply] [Watch: Dir/Any] |
|
#! /usr/bin/perl -w -T
use strict;
my $str = 'foo_bar_foo_bar_12345';
print "$str\n";
$str =~ /(.*)(\1.*)/ || die "Failed!\n";
print "$2\n";
Can anyone explaine why that is? | [reply] [Watch: Dir/Any] [d/l] |
|
From Backtracking:
For a regular expression to match, the entire regular expression must match, not just part of it. So if the beginning of a pattern containing a quantifier succeeds in a way that causes later parts in the pattern to fail, the matching engine backs up and recalculates the beginning part—that's why it's called backtracking.
The regex engine begins as you say, by matching everything to the first .*, but when the whole match fails it then backtracks one character and tries again. Eventually, it has backtracked to the point at which $1 contains foo_bar and $2 contains foo_bar_12345. The regex engine then verifies that this value of $2 does finally satisfy the condition \1.*, so the entire match succeeds and the regex engine stops looking and returns.
HTH,
Athanasius <°(((>< contra mundum
| [reply] [Watch: Dir/Any] |
|
|
Thanks.It worked for me :-)
| [reply] [Watch: Dir/Any] |
Re: regular expression
by Narveson (Chaplain) on Aug 03, 2012 at 14:45 UTC
|
Your regular expression /.*_?(.*)/ starts with .*, which matches any string of any length. The _?(.*) matches the empty string that remains after your first piece has matched everything. So you captured the empty string.
When you review the manual, pay special attention to the meaning of the quantifiers * and ?.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: regular expression
by ww (Archbishop) on Aug 03, 2012 at 14:51 UTC
|
The initial deathstar ( .*) is greedy, qv.
IOW, there's nothing left for your capture after leading .* matched everything in your string.
Read, instead, about look-aheads or try capturing the first Sanity_001_ and the balance of the string (to a second capture clause), and printing/saving only $2 -- either can help you, but learning about look-aheads (and the general category, of which it's a part, "look arounds") will help you greatly, even in the short term. | [reply] [Watch: Dir/Any] [d/l] [select] |