Using Regexp::Common

justrajdeep has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Using Regexp::Common by Corion (Patriarch) on Sep 18, 2015 at 15:22 UTC
From Regexp::Common, that should work. Have you looked at the string that your call to `RE_num_real` returns? What does that string look like? `my $is_binary = RE_num_real(-keep, -group=>3, -sep=>',', -base=>2); print "Checking '$_' against /$is_binary/\n"; if( $_ =~ /$is_binary/ ) { print q{matched a number}; print "Got [$1]\n"; };` [download]	[reply] [d/l] [select]
Re^2: Using Regexp::Common by justrajdeep (Novice) on Sep 18, 2015 at 15:59 UTC
Hi I just checked it out and i have no clue what that regular expression means :( this is what i see `'(?^:((?i)([+-]?)((?=[.]?[0123456789])([0123456789]*)(?:([.])([0123456 +789]{0,}))?)(?:([E])(([+-]?)([0123456789]+))\|)))` [download]	[reply] [d/l]
Re^3: Using Regexp::Common by AnomalousMonk (Archbishop) on Sep 18, 2015 at 17:20 UTC
An explanation (beware possible line wrap of intial print of long regex expression): c:\@Work\Perl\monks>perl -wMstrict -le "use Regexp::Common qw(RE_num_real); use YAPE::Regex::Explain; ;; print YAPE::Regex::Explain->new(RE_num_real(-keep, -group=>3, -sep=>' +,', -base=>2))->explain; " The regular expression: (?-imsx:((?i)([+-]?)((?=[.]?[0123456789])([0123456789])(?:([.])([0123 +456789]{0,}))?)(?:([E])(([+-])([0123456789]+))\|))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- (?i) set flags for this block (case- insensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally) ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- [+-]? any character of: '+', '-' (optional (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- (?= look ahead to see if there is: ---------------------------------------------------------------------- [.]? any character of: '.' (optional (matching the most amount possible)) ---------------------------------------------------------------------- [0123456789] any character of: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ( group and capture to \4: ---------------------------------------------------------------------- [0123456789] any character of: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \4 ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \5: ---------------------------------------------------------------------- [.] any character of: '.' ---------------------------------------------------------------------- ) end of \5 ---------------------------------------------------------------------- ( group and capture to \6: ---------------------------------------------------------------------- [0123456789] any character of: '0', '1', '2', {0,} '3', '4', '5', '6', '7', '8', '9' (at least 0 times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \6 ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- ( group and capture to \7: ---------------------------------------------------------------------- [E] any character of: 'E' ---------------------------------------------------------------------- ) end of \7 ---------------------------------------------------------------------- ( group and capture to \8: ---------------------------------------------------------------------- ( group and capture to \9: ---------------------------------------------------------------------- [+-]? any character of: '+', '-' (optional (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \9 ---------------------------------------------------------------------- ( group and capture to \10: ---------------------------------------------------------------------- [0123456789] any character of: '0', '1', '2', + '3', '4', '5', '6', '7', '8', '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \10 ---------------------------------------------------------------------- ) end of \8 ---------------------------------------------------------------------- \| OR ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] Please also see perlre, perlretut, and perlrequick. There are also a number of on-line regex explainers, but I'm not familiar enough with them to recommend any particular one. (Update: Actually, davido has a nice regex tester which ends up giving a fair amount of explanation, or at least enlightenment. See his personal node for a link) Update: Caution: YAPE::Regex::Explain only supports regex features added through Perl version 5.6. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: Using Regexp::Common by justrajdeep (Novice) on Sep 19, 2015 at 13:32 UTC
Re: Using Regexp::Common by Anonymous Monk on Sep 18, 2015 at 21:53 UTC
So what are the numbers? Whatever this -keep option is its whats messing you up #!/usr/bin/perl -- use strict; use warnings; use Data::Dump qw/ dd /; use Regexp::Common qw/ RE_num_real /; my $shine = '10,101,110.110101101'; { my $rereal = RE_num_real(-keep, -group=>3, -sep=>',', -base=>2); if( $shine =~ m{($rereal)} ){ dd( $1, $2, $3, $4, $5, $6, ); } } { my $rereal = RE_num_real(-group=>3, -sep=>',', -base=>2); if( $shine =~ m{($rereal)} ){ dd( $1, $2, $3, $4, $5, $6, ); } } { my $rereal = RE_num_real(-group=>3, -sep=>',', -base=>2, -keep); if( $shine =~ m{($rereal)} ){ dd( $1, $2, $3, $4, $5, $6, $7, $8, ); } } __END__ (10, 10, "", 10, 10, undef) ("10,101,110.110101101", undef, undef, undef, undef, undef) ( "10,101,110.110101101", "10,101,110.110101101", "", "10,101,110.110101101", "10,101,110", ".", 110101101, undef, ) [download]	[reply] [d/l]
Re^2: Using Regexp::Common (-keep) by tye (Sage) on Sep 19, 2015 at 16:41 UTC
It isn't so much that the `-keep` option is a problem, more that no value was given for the `-keep` option. From Regexp::Common::number: Under `-keep` (see Regexp::Common): $1 captures the entire number $2 captures the optional sign of the number $3 captures the complete set of digits It is pretty sad (IMO) that `RE_num_real()` appears to not complain about getting options that it does not support ("3", ",", and "2"), nor about getting a value for `-keep` that it does not support ("-group"), nor about it getting an odd number of arguments when it is expecting pairs of arguments. It should simply die when any of those things happen. - tye	[reply] [d/l] [select]
Re^2: Using Regexp::Common by justrajdeep (Novice) on Sep 19, 2015 at 13:34 UTC
Ya from your example it is clear that something is wrong with the `-keep` option. I am going to ditch this module altogether.	[reply] [d/l]
Re^3: Using Regexp::Common by AnomalousMonk (Archbishop) on Sep 19, 2015 at 16:13 UTC
I wanted to extract all the numbers from a string that may be separated by any delimiter. I don't understand what "separation by any delimiter" means. I am going to ditch [Regexp::Common] altogether. I think that would be rash. Regexp::Common and number, the extension I think you need, are designed to do many things and are correspondingly complicated, but will, I think, repay effort invested to understand them. (Update: And I think the `-keep` option just needs more study.) I'm still not sure exactly what you require, but here's a sample of code that may be near the ballpark. File: use 5.010; # need perl 5.10+ regex enhancements -- (?\|alts) use warnings; use strict; use Regexp::Common qw(number); my $str = '10,101,110.11010110,123,101.010E-01'; # offsets: 0123456789012345678901234567890123456789 # 1 2 3 my $bin_int = qr{ $RE{num}{int} {-keep}{-base=>2} }xms; my $bin_real = qr{ $RE{num}{real}{-keep}{-base=>2} }xms; my $binary = qr{ (?\| $bin_int \| $bin_real) }xms; MATCH: while ($str =~ m{ \b $binary \b }xmsg) { my $entire = $1; my $fraction = $6; my $exponential = $8; my $expon = defined $exponential && length $exponential; my $real = ! $expon && defined $fraction && length $fraction; next MATCH unless length $entire; printf "matched '%s' at offset %d; is %s \n", $entire, $-[1], $expon ? 'exponential' : $real ? 'real' : 'integer' # default ; } [download] Output: `c:\@Work\Perl\monks\justrajdeep>perl extract_binary_nums_1.pl matched '10' at offset 0; is integer matched '101' at offset 3; is integer matched '110.11010110' at offset 7; is real matched '101.010E-01' at offset 24; is exponential` [download] Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: Using Regexp::Common by AnomalousMonk (Archbishop) on Sep 19, 2015 at 22:36 UTC
Ok, the penny finally dropped for the `-sep=>',' -group=>3` stuff. Is this more like what you're after? (This still needs Perl version 5.10+.) c:\@Work\Perl\monks\justrajdeep>perl -wMstrict -MRegexp::Common=number + -le "my $str = '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,1 +11'; ;; my $bin_int = qr{ $RE{num}{int} {-keep}{-sep=>','}{-group=>3}{-base= +>2} }xms; my $bin_real = qr{ $RE{num}{real}{-keep}{-sep=>','}{-group=>3}{-base= +>2} }xms; ;; my $binary = qr{ (?\| $bin_int \| $bin_real) }xms; ;; while ($str =~ m{ \b $binary \b }xmsg) { ;; my $entire = $1; my $fraction = $6; my $exponential = $8; my ($start, $end) = ($-[1], $+[1]); ;; my $type = (defined $exponential && length $exponential) ? 'exponen +tial' : (defined $fraction && length $fraction) ? 'real' + : 'integer' ; ;; print qq{matched $type}; my $ruler = (' ' x $start) . '^' . ('-' x ($end - $start - 2)) . '^ +'; print qq{'$str'}; print qq{ $ruler \n}; ;; } " matched integer '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111' ^-^ matched integer '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111' ^--------^ matched real '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111' ^-----------------^ matched exponential '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111' ^--------------^ matched integer '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111' ^-^ [download] (The `'exponential' 'real' 'integer'` type classification may be a bit wobbly. Perhaps an exercise for the reader?) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: Using Regexp::Common by justrajdeep (Novice) on Sep 20, 2015 at 11:36 UTC
Re: Using Regexp::Common by BillKSmith (Monsignor) on Sep 18, 2015 at 21:37 UTC
Each refrence to Regexp::Common matches one field. You can use it as part of a more complicated regex. If you tell us exactly what matches you expect from your example, I believe we can help you code the regexp using Regexp::Common. Bill	[reply]
Re^2: Using Regexp::Common by justrajdeep (Novice) on Sep 19, 2015 at 13:30 UTC
Hi Bill I wanted to extract all the numbers from a string that may be separated by any delimiter. Then divide them into floats/integer etc. I thought `Regexp::Common` would be simple to use. But looks like it is not so :(	[reply] [d/l]
Re^3: Using Regexp::Common by BillKSmith (Monsignor) on Sep 19, 2015 at 16:50 UTC
I do not know what you mean by 'number' in your example. Is the period a decimal point or a separator? Do the commas separate numbers or do they separate fields within a number to make them easier to read? Are your numbers binary numbers or are they decimal numbers that just happen to consist of only ones and zeros? Do you want to parse the string more than once, using different criteria? Lets get you example working. We can generalize later. Please tell us exactly what results you expect from your single example. Bill	[reply]
Re^4: Using Regexp::Common by justrajdeep (Novice) on Sep 20, 2015 at 11:31 UTC
Re^5: Using Regexp::Common by AnomalousMonk (Archbishop) on Sep 20, 2015 at 16:52 UTC
Re^5: Using Regexp::Common by BillKSmith (Monsignor) on Sep 20, 2015 at 20:00 UTC
Some notes below your chosen depth have not been shown here
Re: Using Regexp::Common by Monkless (Acolyte) on Sep 18, 2015 at 21:24 UTC
Im not familiar with that module, but I would recommend leveraging PERL's ability to do regex without a module Here is a handy online Regex generator, where you can supply your data set and then start to work out your regex and import that into the perl script - https://regex101.com/	[reply]
Re^2: Using Regexp::Common by justrajdeep (Novice) on Sep 19, 2015 at 13:27 UTC
Thanks I will try it out.	[reply]


The stupid question is the question not asked
	PerlMonks