Re: Simple regular expression problem
by polypompholyx (Chaplain) on Oct 03, 2005 at 13:59 UTC
|
Apart from the typos, the reason is that \w+ will gobble up all the word characters in $str, which includes the number at the end. Since your match specifies 'zero or more' numbers at then end, $num gets an empty string. You need to modify the regex to make the \w+ non-greedy, using the ? modifier:
my $str = "abdbdr23";
my ( $name, $num ) = ( $str =~ /^(\w+?)(\d*)$/ );
* is a much less ghastly way of writing {0,}.
| [reply] [d/l] [select] |
|
It's better to avoid the ? modifier in most cases, as it's less efficient as alternatives. Here's a benchmark:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark 'cmpthese';
use Test::More tests => 2;
our @data = qw 'foo123 abdbdr23 abcd2 abc 1234 foo!123';
our (@plain, @sticky);
my @expected = ([qw 'foo 123'], [qw 'abdbdr 23'], [qw 'abcd 2'],
['abc', ''], ['', 1234], []);
cmpthese -1, {
plain => '@plain = map {[/^([a-z]*)(\d*)$/]} @data',
sticky => '@sticky = map {[/^(\w*?)(\d*)$/]} @data',
};
is_deeply \@plain, \@expected;
is_deeply \@sticky, \@expected;
__END__
1..2
Rate sticky plain
sticky 32582/s -- -17%
plain 39385/s 21% --
ok 1
ok 2
| [reply] [d/l] |
|
1..2
Rate plain sticky
plain 23682/s -- -1%
sticky 23904/s 1% --
ok 1
ok 2
Oddly the difference dropped to mere 100s per second.
| [reply] [d/l] |
|
|
|
The OP didn't make it clear whether the string before the number could contain digits. However, it's certainly better to be specific in a regex: if you know (for some value of 'know') something will only contain [A-Za-z], not \w, then the former is probably preferable. On the other hand, [A-Za-z] too often it means "I cannot think of any other letters", and then your script barfs on something perfectly valid, but unexpected, like "Ångström".
| [reply] [d/l] [select] |
|
thanx for all the suggestions. I fixed it with \w+? or maybe I use the alpha example!!
And now that I understand my mistake, I see that it was all the time already described in the perldoc manual!!
All your replies are really helpful,
Thanks a lot!!
Luca
| [reply] |
|
That's won't work if string contains no digits (which the OP said was a possible input). For example, $name will be just "a" for "abdbdr".
Update: Just plain wrong.
| [reply] |
|
Therefore he anchored the regex with $ to match until the end regardless if a digit is present or not
| [reply] |
|
Re: Simple regular expression problem
by prasadbabu (Prior) on Oct 03, 2005 at 13:49 UTC
|
\w will match alphanumeric characters [0-9a-zA-Z_]. So in your regex \w matches digit also. So change it as shown.
Also you are missing a parantheses in second grouping.
$str = "abdbdr23";
($name,$num) = ($str =~ /^([a-zA-Z]+)(\d{0,})$/) ;
print "$name\t$num\n";
| [reply] [d/l] [select] |
|
<shudder> That works fine in English. But not so good in pretty much any other language. e.g., accented characters and the like, or non-roman languages such as Arabic, Hebrew, Hindi, or pretty much any Asian language. Ok, maybe today you don't support them, but maybe tomorrow? Besides that, this regexp is not self-documenting if you mean to say you want to match "letters". Better to use the POSIX classes documented in perlre:
($name,$num) = $str =~ /^([[:alpha:]]+)(\d*)$/;
This does a full unicode match against "alphabet". Which has a very well-defined and globalised meaning.
I'm also unsure why you use "{0,}" - this has precisely the same meaning as "*". Especially when you used "+" instead of "{1,}". Over everything else, be consistant! | [reply] [d/l] |
Re: Simple regular expression problem
by muba (Priest) on Oct 03, 2005 at 13:56 UTC
|
($name, $num) = # assign string parts to variables
($str =~ # we gonna do regexes!
/^ # beginning of the string
( # begin of group
\w+ # \w a couple'o times
) # end of group
( # begin of group
\d{0,} # \d a couple'o times
# why not just \d* ?
$ # end of string
/ # backslash after end of string
) # end of group
; # a semicolon after end of string
# unexpected end of line?
I challenge you to find the mistake :)
Update: and also see the first reply :)
| [reply] [d/l] [select] |
Re: Simple regular expression problem
by sauoq (Abbot) on Oct 03, 2005 at 17:42 UTC
|
I need to split a string
Usually when a person says that, he really wants to use split. I don't see why this would be an exception...
($name, $num) = split /(?=\d+$)/, $str, 2;
This avoids all the uproar about about non-ascii characters and meets your specification exactly in that it makes no assumptions about the string prior to the digits at the end. It uses a zero-width look-ahead assertion to split without losing characters and it uses the 3 argument version of split to limit our split to two parts (otherwise a string like "abc123" would split into 4 parts.)
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] [select] |