Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Using Regexp::Common

by justrajdeep (Novice)
on Sep 18, 2015 at 15:18 UTC ( [id://1142438]=perlquestion: print w/replies, xml ) Need Help??

justrajdeep has asked for the wisdom of the Perl Monks concerning the following question:

Hi Wise Monks,

i need some help using Regexp::Common, can one of you guide me.

in the example given, with data 10,101,110.11010110

if i use something like $_ =~ RE_num_real(-keep, -group=>3, -sep=>',', -base=>2) and print q{a number};

i get only one match 10

if there are multiple matches I am unable to get them. Can you please guide me as to how to get the other matches also.

Replies are listed 'Best First'.
Re: Using Regexp::Common
by Corion (Patriarch) on Sep 18, 2015 at 15:22 UTC

    From Regexp::Common, that should work. Have you looked at the string that your call to RE_num_real returns? What does that string look like?

    my $is_binary = RE_num_real(-keep, -group=>3, -sep=>',', -base=>2); print "Checking '$_' against /$is_binary/\n"; if( $_ =~ /$is_binary/ ) { print q{matched a number}; print "Got [$1]\n"; };

      Hi

      I just checked it out and i have no clue what that regular expression means :(

      this is what i see

      '(?^:((?i)([+-]?)((?=[.]?[0123456789])([0123456789]*)(?:([.])([0123456 +789]{0,}))?)(?:([E])(([+-]?)([0123456789]+))|)))

        An explanation (beware possible line wrap of intial print of long regex expression):

        c:\@Work\Perl\monks>perl -wMstrict -le "use Regexp::Common qw(RE_num_real); use YAPE::Regex::Explain; ;; print YAPE::Regex::Explain->new(RE_num_real(-keep, -group=>3, -sep=>' +,', -base=>2))->explain; " The regular expression: (?-imsx:((?i)([+-]?)((?=[.]?[0123456789])([0123456789]*)(?:([.])([0123 +456789]{0,}))?)(?:([E])(([+-])([0123456789]+))|))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- (?i) set flags for this block (case- insensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally) ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- [+-]? any character of: '+', '-' (optional (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- (?= look ahead to see if there is: ---------------------------------------------------------------------- [.]? any character of: '.' (optional (matching the most amount possible)) ---------------------------------------------------------------------- [0123456789] any character of: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ( group and capture to \4: ---------------------------------------------------------------------- [0123456789]* any character of: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \4 ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \5: ---------------------------------------------------------------------- [.] any character of: '.' ---------------------------------------------------------------------- ) end of \5 ---------------------------------------------------------------------- ( group and capture to \6: ---------------------------------------------------------------------- [0123456789] any character of: '0', '1', '2', {0,} '3', '4', '5', '6', '7', '8', '9' (at least 0 times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \6 ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- ( group and capture to \7: ---------------------------------------------------------------------- [E] any character of: 'E' ---------------------------------------------------------------------- ) end of \7 ---------------------------------------------------------------------- ( group and capture to \8: ---------------------------------------------------------------------- ( group and capture to \9: ---------------------------------------------------------------------- [+-]? any character of: '+', '-' (optional (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \9 ---------------------------------------------------------------------- ( group and capture to \10: ---------------------------------------------------------------------- [0123456789] any character of: '0', '1', '2', + '3', '4', '5', '6', '7', '8', '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \10 ---------------------------------------------------------------------- ) end of \8 ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
        Please also see perlre, perlretut, and perlrequick. There are also a number of on-line regex explainers, but I'm not familiar enough with them to recommend any particular one. (Update: Actually, davido has a nice regex tester which ends up giving a fair amount of explanation, or at least enlightenment. See his personal node for a link)

        Update: Caution: YAPE::Regex::Explain only supports regex features added through Perl version 5.6.


        Give a man a fish:  <%-{-{-{-<

Re: Using Regexp::Common
by Anonymous Monk on Sep 18, 2015 at 21:53 UTC

    So what are the numbers?

    Whatever this -keep option is its whats messing you up

    #!/usr/bin/perl -- use strict; use warnings; use Data::Dump qw/ dd /; use Regexp::Common qw/ RE_num_real /; my $shine = '10,101,110.110101101'; { my $rereal = RE_num_real(-keep, -group=>3, -sep=>',', -base=>2); if( $shine =~ m{($rereal)} ){ dd( $1, $2, $3, $4, $5, $6, ); } } { my $rereal = RE_num_real(-group=>3, -sep=>',', -base=>2); if( $shine =~ m{($rereal)} ){ dd( $1, $2, $3, $4, $5, $6, ); } } { my $rereal = RE_num_real(-group=>3, -sep=>',', -base=>2, -keep); if( $shine =~ m{($rereal)} ){ dd( $1, $2, $3, $4, $5, $6, $7, $8, ); } } __END__ (10, 10, "", 10, 10, undef) ("10,101,110.110101101", undef, undef, undef, undef, undef) ( "10,101,110.110101101", "10,101,110.110101101", "", "10,101,110.110101101", "10,101,110", ".", 110101101, undef, )

      It isn't so much that the -keep option is a problem, more that no value was given for the -keep option. From Regexp::Common::number:

      Under -keep (see Regexp::Common):
      $1
      captures the entire number
      $2
      captures the optional sign of the number
      $3
      captures the complete set of digits

      It is pretty sad (IMO) that RE_num_real() appears to not complain about getting options that it does not support ("3", ",", and "2"), nor about getting a value for -keep that it does not support ("-group"), nor about it getting an odd number of arguments when it is expecting pairs of arguments.

      It should simply die when any of those things happen.

      - tye        

      Ya from your example it is clear that something is wrong with the -keep option. I am going to ditch this module altogether.

        I wanted to extract all the numbers from a string that may be separated by any delimiter.

        I don't understand what "separation by any delimiter" means.

        I am going to ditch [Regexp::Common] altogether.

        I think that would be rash. Regexp::Common and number, the extension I think you need, are designed to do many things and are correspondingly complicated, but will, I think, repay effort invested to understand them. (Update: And I think the  -keep option just needs more study.) I'm still not sure exactly what you require, but here's a sample of code that may be near the ballpark.

        File:

        use 5.010; # need perl 5.10+ regex enhancements -- (?|alts) use warnings; use strict; use Regexp::Common qw(number); my $str = '10,101,110.11010110,123,101.010E-01'; # offsets: 0123456789012345678901234567890123456789 # 1 2 3 my $bin_int = qr{ $RE{num}{int} {-keep}{-base=>2} }xms; my $bin_real = qr{ $RE{num}{real}{-keep}{-base=>2} }xms; my $binary = qr{ (?| $bin_int | $bin_real) }xms; MATCH: while ($str =~ m{ \b $binary \b }xmsg) { my $entire = $1; my $fraction = $6; my $exponential = $8; my $expon = defined $exponential && length $exponential; my $real = ! $expon && defined $fraction && length $fraction; next MATCH unless length $entire; printf "matched '%s' at offset %d; is %s \n", $entire, $-[1], $expon ? 'exponential' : $real ? 'real' : 'integer' # default ; }
        Output:
        c:\@Work\Perl\monks\justrajdeep>perl extract_binary_nums_1.pl matched '10' at offset 0; is integer matched '101' at offset 3; is integer matched '110.11010110' at offset 7; is real matched '101.010E-01' at offset 24; is exponential


        Give a man a fish:  <%-{-{-{-<

        Ok, the penny finally dropped for the  -sep=>','  -group=>3 stuff. Is this more like what you're after? (This still needs Perl version 5.10+.)

        c:\@Work\Perl\monks\justrajdeep>perl -wMstrict -MRegexp::Common=number + -le "my $str = '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,1 +11'; ;; my $bin_int = qr{ $RE{num}{int} {-keep}{-sep=>','}{-group=>3}{-base= +>2} }xms; my $bin_real = qr{ $RE{num}{real}{-keep}{-sep=>','}{-group=>3}{-base= +>2} }xms; ;; my $binary = qr{ (?| $bin_int | $bin_real) }xms; ;; while ($str =~ m{ \b $binary \b }xmsg) { ;; my $entire = $1; my $fraction = $6; my $exponential = $8; my ($start, $end) = ($-[1], $+[1]); ;; my $type = (defined $exponential && length $exponential) ? 'exponen +tial' : (defined $fraction && length $fraction) ? 'real' + : 'integer' ; ;; print qq{matched $type}; my $ruler = (' ' x $start) . '^' . ('-' x ($end - $start - 2)) . '^ +'; print qq{'$str'}; print qq{ $ruler \n}; ;; } " matched integer '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111' ^-^ matched integer '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111' ^--------^ matched real '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111' ^-----------------^ matched exponential '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111' ^--------------^ matched integer '100,11,111,111,10,101,110.11010110,10,101,010.0E-01,123,111' ^-^
        (The  'exponential' 'real' 'integer' type classification may be a bit wobbly. Perhaps an exercise for the reader?)


        Give a man a fish:  <%-{-{-{-<

Re: Using Regexp::Common
by BillKSmith (Monsignor) on Sep 18, 2015 at 21:37 UTC
    Each refrence to Regexp::Common matches one field. You can use it as part of a more complicated regex. If you tell us exactly what matches you expect from your example, I believe we can help you code the regexp using Regexp::Common.
    Bill

      Hi Bill

      I wanted to extract all the numbers from a string that may be separated by any delimiter. Then divide them into floats/integer etc. I thought Regexp::Common would be simple to use. But looks like it is not so :(

        I do not know what you mean by 'number' in your example. Is the period a decimal point or a separator? Do the commas separate numbers or do they separate fields within a number to make them easier to read? Are your numbers binary numbers or are they decimal numbers that just happen to consist of only ones and zeros? Do you want to parse the string more than once, using different criteria? Lets get you example working. We can generalize later. Please tell us exactly what results you expect from your single example.
        Bill
Re: Using Regexp::Common
by Monkless (Acolyte) on Sep 18, 2015 at 21:24 UTC

    Im not familiar with that module, but I would recommend leveraging PERL's ability to do regex without a module


    Here is a handy online Regex generator, where you can supply your data set and then start to work out your regex and import that into the perl script - https://regex101.com/

      Thanks I will try it out.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1142438]
Approved by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-24 08:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found