justrajdeep has asked for the wisdom of the Perl Monks concerning the following question:
Hi Wise Monks,
i need some help using Regexp::Common, can one of you guide me.
in the example given, with data 10,101,110.11010110
if i use something like $_ =~ RE_num_real(-keep, -group=>3, -sep=>',', -base=>2) and print q{a number};
i get only one match 10
if there are multiple matches I am unable to get them. Can you please guide me as to how to get the other matches also.
Re: Using Regexp::Common
by Corion (Patriarch) on Sep 18, 2015 at 15:22 UTC
|
From Regexp::Common, that should work. Have you looked at the string that your call to RE_num_real returns? What does that string look like?
my $is_binary = RE_num_real(-keep, -group=>3, -sep=>',', -base=>2);
print "Checking '$_' against /$is_binary/\n";
if( $_ =~ /$is_binary/ ) {
print q{matched a number};
print "Got [$1]\n";
};
| [reply] [d/l] [select] |
|
'(?^:((?i)([+-]?)((?=[.]?[0123456789])([0123456789]*)(?:([.])([0123456
+789]{0,}))?)(?:([E])(([+-]?)([0123456789]+))|)))
| [reply] [d/l] |
|
c:\@Work\Perl\monks>perl -wMstrict -le
"use Regexp::Common qw(RE_num_real);
use YAPE::Regex::Explain;
;;
print YAPE::Regex::Explain->new(RE_num_real(-keep, -group=>3, -sep=>'
+,', -base=>2))->explain;
"
The regular expression:
(?-imsx:((?i)([+-]?)((?=[.]?[0123456789])([0123456789]*)(?:([.])([0123
+456789]{0,}))?)(?:([E])(([+-])([0123456789]+))|)))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?i) set flags for this block (case-
insensitive) (with ^ and $ matching
normally) (with . not matching \n)
(matching whitespace and # normally)
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[+-]? any character of: '+', '-' (optional
(matching the most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[.]? any character of: '.' (optional
(matching the most amount possible))
----------------------------------------------------------------------
[0123456789] any character of: '0', '1', '2',
'3', '4', '5', '6', '7', '8', '9'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
[0123456789]* any character of: '0', '1', '2',
'3', '4', '5', '6', '7', '8', '9' (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
( group and capture to \5:
----------------------------------------------------------------------
[.] any character of: '.'
----------------------------------------------------------------------
) end of \5
----------------------------------------------------------------------
( group and capture to \6:
----------------------------------------------------------------------
[0123456789] any character of: '0', '1', '2',
{0,} '3', '4', '5', '6', '7', '8', '9'
(at least 0 times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \6
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
( group and capture to \7:
----------------------------------------------------------------------
[E] any character of: 'E'
----------------------------------------------------------------------
) end of \7
----------------------------------------------------------------------
( group and capture to \8:
----------------------------------------------------------------------
( group and capture to \9:
----------------------------------------------------------------------
[+-]? any character of: '+', '-'
(optional (matching the most
amount possible))
----------------------------------------------------------------------
) end of \9
----------------------------------------------------------------------
( group and capture to \10:
----------------------------------------------------------------------
[0123456789] any character of: '0', '1', '2',
+ '3', '4', '5', '6', '7', '8', '9'
(1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \10
----------------------------------------------------------------------
) end of \8
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Please also see perlre, perlretut, and perlrequick. There are also a number of on-line regex explainers, but I'm not familiar enough with them to recommend any particular one. (Update: Actually, davido has a nice regex tester which ends up giving a fair amount of explanation, or at least enlightenment. See his personal node for a link)
Update: Caution: YAPE::Regex::Explain only supports regex features added through Perl version 5.6.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
Re: Using Regexp::Common
by Anonymous Monk on Sep 18, 2015 at 21:53 UTC
|
#!/usr/bin/perl --
use strict;
use warnings;
use Data::Dump qw/ dd /;
use Regexp::Common qw/ RE_num_real /;
my $shine = '10,101,110.110101101';
{
my $rereal = RE_num_real(-keep, -group=>3, -sep=>',', -base=>2);
if( $shine =~ m{($rereal)} ){
dd( $1, $2, $3, $4, $5, $6, );
}
}
{
my $rereal = RE_num_real(-group=>3, -sep=>',', -base=>2);
if( $shine =~ m{($rereal)} ){
dd( $1, $2, $3, $4, $5, $6, );
}
}
{
my $rereal = RE_num_real(-group=>3, -sep=>',', -base=>2, -keep);
if( $shine =~ m{($rereal)} ){
dd( $1, $2, $3, $4, $5, $6, $7, $8, );
}
}
__END__
(10, 10, "", 10, 10, undef)
("10,101,110.110101101", undef, undef, undef, undef, undef)
(
"10,101,110.110101101",
"10,101,110.110101101",
"",
"10,101,110.110101101",
"10,101,110",
".",
110101101,
undef,
)
| [reply] [d/l] |
|
It isn't so much that the -keep option is a problem, more that no value was given for the -keep option. From Regexp::Common::number:
Under -keep (see Regexp::Common):
- $1
- captures the entire number
- $2
- captures the optional sign of the number
- $3
- captures the complete set of digits
It is pretty sad (IMO) that RE_num_real() appears to not complain about getting options that it does not support ("3", ",", and "2"), nor about getting a value for -keep that it does not support ("-group"), nor about it getting an odd number of arguments when it is expecting pairs of arguments.
It should simply die when any of those things happen.
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
|
I wanted to extract all the numbers from a string that may be separated by any delimiter.
I don't understand what "separation by any delimiter" means.
I am going to ditch [Regexp::Common] altogether.
I think that would be rash. Regexp::Common and number, the extension I think you need, are designed to do many things and are correspondingly complicated, but will, I think, repay effort invested to understand them. (Update: And I think the -keep option just needs more study.) I'm still not sure exactly what you require, but here's a sample of code that may be near the ballpark.
File:
use 5.010; # need perl 5.10+ regex enhancements -- (?|alts)
use warnings;
use strict;
use Regexp::Common qw(number);
my $str = '10,101,110.11010110,123,101.010E-01';
# offsets: 0123456789012345678901234567890123456789
# 1 2 3
my $bin_int = qr{ $RE{num}{int} {-keep}{-base=>2} }xms;
my $bin_real = qr{ $RE{num}{real}{-keep}{-base=>2} }xms;
my $binary = qr{ (?| $bin_int | $bin_real) }xms;
MATCH:
while ($str =~ m{ \b $binary \b }xmsg) {
my $entire = $1;
my $fraction = $6;
my $exponential = $8;
my $expon = defined $exponential && length $exponential;
my $real = ! $expon && defined $fraction && length $fraction;
next MATCH unless length $entire;
printf "matched '%s' at offset %d; is %s \n",
$entire, $-[1],
$expon ? 'exponential' :
$real ? 'real' :
'integer' # default
;
}
Output:
c:\@Work\Perl\monks\justrajdeep>perl extract_binary_nums_1.pl
matched '10' at offset 0; is integer
matched '101' at offset 3; is integer
matched '110.11010110' at offset 7; is real
matched '101.010E-01' at offset 24; is exponential
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
c:\@Work\Perl\monks\justrajdeep>perl -wMstrict -MRegexp::Common=number
+ -le
"my $str = '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,1
+11';
;;
my $bin_int = qr{ $RE{num}{int} {-keep}{-sep=>','}{-group=>3}{-base=
+>2} }xms;
my $bin_real = qr{ $RE{num}{real}{-keep}{-sep=>','}{-group=>3}{-base=
+>2} }xms;
;;
my $binary = qr{ (?| $bin_int | $bin_real) }xms;
;;
while ($str =~ m{ \b $binary \b }xmsg) {
;;
my $entire = $1;
my $fraction = $6;
my $exponential = $8;
my ($start, $end) = ($-[1], $+[1]);
;;
my $type = (defined $exponential && length $exponential) ? 'exponen
+tial' :
(defined $fraction && length $fraction) ? 'real'
+ :
'integer'
;
;;
print qq{matched $type};
my $ruler = (' ' x $start) . '^' . ('-' x ($end - $start - 2)) . '^
+';
print qq{'$str'};
print qq{ $ruler \n};
;;
}
"
matched integer
'100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111'
^-^
matched integer
'100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111'
^--------^
matched real
'100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111'
^-----------------^
matched exponential
'100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111'
^--------------^
matched integer
'100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111'
^-^
(The 'exponential' 'real' 'integer' type classification may be a bit wobbly. Perhaps an exercise for the reader?)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
Re: Using Regexp::Common
by BillKSmith (Monsignor) on Sep 18, 2015 at 21:37 UTC
|
Each refrence to Regexp::Common matches one field. You can use it as part of a more complicated regex. If you tell us exactly what matches you expect from your example, I believe we can help you code the regexp using Regexp::Common.
| [reply] |
|
| [reply] [d/l] |
|
I do not know what you mean by 'number' in your example. Is the period a decimal point or a separator? Do the commas separate numbers or do they separate fields within a number to make them easier to read? Are your numbers binary numbers or are they decimal numbers that just happen to consist of only ones and zeros? Do you want to parse the string more than once, using different criteria? Lets get you example working. We can generalize later. Please tell us exactly what results you expect from your single example.
| [reply] |
|
|
|
|
Re: Using Regexp::Common
by Monkless (Acolyte) on Sep 18, 2015 at 21:24 UTC
|
Im not familiar with that module, but I would recommend leveraging PERL's ability to do regex without a module
Here is a handy online Regex generator, where you can supply your data set and then start to work out your regex and import that into the perl script - https://regex101.com/
| [reply] |
|
Thanks I will try it out.
| [reply] |
|
|