|
blackadder has asked for the
wisdom of the Perl Monks concerning the following question:
Bonjour Monks,..
I have strings like these;
uk1sxve01205.gfjgjf5.fdhd5
usasxve513.gfdhf4.hgfd4
I am trying to capture the first 3 chars and all the digits
up to the first dot from the left. So I put this bit of code together
my ($Site_Code, @RS) = ($String=~ /^(...)....(\d+)/g);
But @RS only contains one element! Instead of what I expected that \d+)/g); will place each digit as an array element in @RS!
Enlightments s'il vou plait
Re: using regex to capture a string and an array by jettero (Prior) on Nov 01, 2009 at 13:33 UTC |
| [reply] [d/l] |
|
| [reply] [d/l] |
Re: using regex to capture a string and an array by BioLion (Friar) on Nov 01, 2009 at 13:53 UTC |
Check out perlre and the bit about greedy matching (in fact the whole thing is wirth a read). The (\d+) will capture one *or more* digits, and so will capture all the digits in one slurp. Also using the dot character is nasty as it will match anything and can lead to unexpected nasties...
Capturing an unknown number of things with regexes is difficult (c.f. known elements of unknown length), and so i would suggest keeping it simple and adding an intermediate step:
use strict;
use warnings;
while (<DATA>){
my $input = $_;
if ( ## are you sure the format is correct?
$input =~
m/^(\w{3}) ## match 3 alphanumerics at the start
[^\d]* ## non digits in the middle
(\d+) ## capture all the digits before
\. ## an actual dot
/x
){
my $site_code = $1;
my @rs = split '', $2; ## split the digits up into an array
print "Input : $input\n\$site_code : \'$site_code\'\n\@rs :\n\t",
+(join "\n\t", @rs), "\n";
}
else{
## ... process alternately?
print "input \'$input\' cannot be processed.\n";
}
}
__DATA__
uk1sxve01205.gfjgjf5.fdhd5
usasxve513.gfdhf4.hgfd4
how_did_this_get_here?
Just a something something...
| [reply] [d/l] [select] |
Re: using regex to capture a string and an array by ikegami (Saint) on Nov 01, 2009 at 18:36 UTC |
/g indicates the match should be performed repeatedly, so
/^(...)....(\d+)/g
is basically
/
^(...)....(\d+)
(?: (?s:.*?) ^(...)....(\d+)
(?: (?s:.*?) ^(...)....(\d+)
(?: (?s:.*?) ^(...)....(\d+)
etc
)?)?)?
/xg
(But with less ability to backtrack)
| [reply] [d/l] [select] |
Re: using regex to capture a string and an array by AnomalousMonk (Deacon) on Nov 01, 2009 at 18:38 UTC |
The following does the job with a single regex (well, almost), but is perhaps a bit too cute to live in production code:
>perl -wMstrict -le
"for my $String (@ARGV) {
my ($Site_Code, @RS) =
grep defined,
$String =~ m{ \A (...) .{4} | \G (\d) }xmsg
;
local $\" = q{' '};
print qq{site code: '$Site_Code' digits: '@RS'};
}
"
uk1sxve01205.gfjgjf5.fdhd5 usasxve513.gfdhf4.hgfd4
site code: 'uk1' digits: '0' '1' '2' '0' '5'
site code: 'usa' digits: '5' '1' '3'
As you will see if you eliminate the grep statement, the regex produces a rain of undefined elements.
I think I would prefer an approach more in line with those given in other replies:
-
Extract the decimal digits as a single string. This allows you to be very specific about what you want.
- If you are interested in the individual digit characters, split the string of digits to an array to get them.
| [reply] [d/l] |
Back to
Seekers of Perl Wisdom
|