Simple regular expression problem

jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I thought I understood regexp but today this problem proved me wrong :(
I need to split a string, something like: abcd2 or abc. Meaning that the length is unknown and if a number is at the end!
I like to put the characters and the number in variables, like:

  $str = "abdbdr23"
($name,$num) = ($str =~ /^(\w+)(\d{0,}$/) ;
[download]

Any suggestions why this is not working ?

Thanks
Luca

Comment on Simple regular expression problem Download Code

Replies are listed 'Best First'.
Re: Simple regular expression problem by polypompholyx (Chaplain) on Oct 03, 2005 at 13:59 UTC
Apart from the typos, the reason is that `\w+` will gobble up all the word characters in `$str`, which includes the number at the end. Since your match specifies 'zero or more' numbers at then end, `$num` gets an empty string. You need to modify the regex to make the `\w+` non-greedy, using the `?` modifier: `my $str = "abdbdr23"; my ( $name, $num ) = ( $str =~ /^(\w+?)(\d)$/ );` [download] `` is a much less ghastly way of writing `{0,}`.	[reply] [d/l] [select]
Re^2: Simple regular expression problem by Perl Mouse (Chaplain) on Oct 03, 2005 at 14:14 UTC
It's better to avoid the `?` modifier in most cases, as it's less efficient as alternatives. Here's a benchmark: #!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; use Test::More tests => 2; our @data = qw 'foo123 abdbdr23 abcd2 abc 1234 foo!123'; our (@plain, @sticky); my @expected = ([qw 'foo 123'], [qw 'abdbdr 23'], [qw 'abcd 2'], ['abc', ''], ['', 1234], []); cmpthese -1, { plain => '@plain = map {[/^([a-z])(\d)$/]} @data', sticky => '@sticky = map {[/^(\w?)(\d)$/]} @data', }; is_deeply \@plain, \@expected; is_deeply \@sticky, \@expected; __END__ 1..2 Rate sticky plain sticky 32582/s -- -17% plain 39385/s 21% -- ok 1 ok 2 [download] `Perl --((8:>*`	[reply] [d/l]
Re^3: Simple regular expression problem by eric256 (Parson) on Oct 03, 2005 at 14:57 UTC
Benchmarking is fun. However you should consider your results a little more carefully before making recomendation on them. This would definitly count as a minor optimization at best since we are talking about 32k instead of 40k per second. Which means unless you are are doing 100k's of these compares you are never going to notice the difference. Also interesting is the result of that benchmark on my machine: `1..2 Rate plain sticky plain 23682/s -- -1% sticky 23904/s 1% -- ok 1 ok 2` [download] Oddly the difference dropped to mere 100s per second. ___________ Eric Hodges	[reply] [d/l]
Re^4: Simple regular expression problem by Perl Mouse (Chaplain) on Oct 03, 2005 at 15:30 UTC
Re^5: Simple regular expression problem by eric256 (Parson) on Oct 03, 2005 at 16:31 UTC
Re^3: Simple regular expression problem by polypompholyx (Chaplain) on Oct 03, 2005 at 14:31 UTC
The OP didn't make it clear whether the string before the number could contain digits. However, it's certainly better to be specific in a regex: if you know (for some value of 'know') something will only contain `[A-Za-z]`, not `\w`, then the former is probably preferable. On the other hand, `[A-Za-z]` too often it means "I cannot think of any other letters", and then your script barfs on something perfectly valid, but unexpected, like "Ångström".	[reply] [d/l] [select]
Re^3: Simple regular expression problem by jeanluca (Deacon) on Oct 03, 2005 at 14:59 UTC
thanx for all the suggestions. I fixed it with \w+? or maybe I use the alpha example!! And now that I understand my mistake, I see that it was all the time already described in the perldoc manual!! All your replies are really helpful, Thanks a lot!! Luca	[reply]
Re^2: Simple regular expression problem by ikegami (Patriarch) on Oct 03, 2005 at 14:40 UTC
~~That's won't work if string contains no digits (which the OP said was a possible input). For example, $name will be just "a" for "abdbdr".~~ Update: Just plain wrong.	[reply]
Re^3: Simple regular expression problem by Dietz (Curate) on Oct 03, 2005 at 14:45 UTC
Therefore he anchored the regex with $ to match until the end regardless if a digit is present or not	[reply]
Re^4: Simple regular expression problem by ikegami (Patriarch) on Oct 03, 2005 at 14:47 UTC
Re: Simple regular expression problem by prasadbabu (Prior) on Oct 03, 2005 at 13:49 UTC
\w will match alphanumeric characters `[0-9a-zA-Z_]`. So in your regex \w matches digit also. So change it as shown. Also you are missing a parantheses in second grouping. `$str = "abdbdr23"; ($name,$num) = ($str =~ /^([a-zA-Z]+)(\d{0,})$/) ; print "$name\t$num\n";` [download] Prasad	[reply] [d/l] [select]
Re^2: Simple regular expression problem by Tanktalus (Canon) on Oct 03, 2005 at 14:29 UTC
<shudder> That works fine in English. But not so good in pretty much any other language. e.g., accented characters and the like, or non-roman languages such as Arabic, Hebrew, Hindi, or pretty much any Asian language. Ok, maybe today you don't support them, but maybe tomorrow? Besides that, this regexp is not self-documenting if you mean to say you want to match "letters". Better to use the POSIX classes documented in perlre: `($name,$num) = $str =~ /^([[:alpha:]]+)(\d)$/;` [download] This does a full unicode match against "alphabet". Which has a very well-defined and globalised meaning. I'm also unsure why you use "{0,}" - this has precisely the same meaning as "". Especially when you used "+" instead of "{1,}". Over everything else, be consistant!	[reply] [d/l]
Re: Simple regular expression problem by muba (Priest) on Oct 03, 2005 at 13:56 UTC
`($name,$num) = ($str =~ /^(\w+)(\d{0,}$/) ;` [download] As for what I can see, you forgot something: `($name, $num) = # assign string parts to variables ($str =~ # we gonna do regexes! /^ # beginning of the string ( # begin of group \w+ # \w a couple'o times ) # end of group ( # begin of group \d{0,} # \d a couple'o times # why not just \d* ? $ # end of string / # backslash after end of string ) # end of group ; # a semicolon after end of string # unexpected end of line?` [download] I challenge you to find the mistake :) Update: and also see the first reply :)	[reply] [d/l] [select]
Re: Simple regular expression problem by sauoq (Abbot) on Oct 03, 2005 at 17:42 UTC
I need to split a string Usually when a person says that, he really wants to use `split`. I don't see why this would be an exception... `($name, $num) = split /(?=\d+$)/, $str, 2;` [download] This avoids all the uproar about about non-ascii characters and meets your specification exactly in that it makes no assumptions about the string prior to the digits at the end. It uses a zero-width look-ahead assertion to split without losing characters and it uses the 3 argument version of split to limit our split to two parts (otherwise a string like "abc123" would split into 4 parts.) -sauoq "My two cents aren't worth a dime.";	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom