Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

regex needed: Capture a single entity from two discontiguous parts.

by Anonymous Monk
on Dec 23, 2006 at 15:48 UTC ( #591447=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Whats the best way to extract the number inside the parens without the comma?
date = "Number... (1,197)"
I was using something like:
/Number... \((.+?)\)/
but that returns the number with the comma...I suppose I could just replace it with nothing but I was hoping to do it with the RE. Thanks

Comment on regex needed: Capture a single entity from two discontiguous parts.
Select or Download Code
Re: regex needed: Capture a single entity from two discontiguous parts.
by jettero (Monsignor) on Dec 23, 2006 at 15:53 UTC
    my $number = $1 if $string =~ m/\(([\d,]+)\)/; $number =~ s/,//g; # how I would approach the problem...

    It doesn't seem like a terribly complicated problem as is, but I do wonder if there's a good way to capture the digits in $string if there are an unknown number of commas — without needing to have a second regex.

    -Paul

      That's what I was currently doing. I just hoped I could do it in the regex, no big deal I guess. Thanks
Re: regex needed: Capture a single entity from two discontiguous parts.
by BrowserUk (Pope) on Dec 23, 2006 at 16:18 UTC

    The basic answer is you cannot extract a single entity from two discontiguous parts of a string using a regex alone. If you extracted the bits using regex captures, you'd have to concatentate them together at some point. And as there is no way to write a repeated capture, it's gonna get messy trying when there are a variable number of parts to capture.

    Your simplest method, as you mentioned, would be to extract the contents of the parens and then remove the commas.

    print "$_ : ", do{ ($_) = m[ \( ( [^)]+ ) \) ]x; tr[,][]d; $_ } for map "($_)", qw[ 1,234 1,234,567 1,234,567,890 ];; (1,234) : 1234 (1,234,567) : 1234567 (1,234,567,890) : 1234567890

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: regex needed: Capture a single entity from two discontiguous parts.
by explorer (Chaplain) on Dec 23, 2006 at 16:24 UTC

    More simple...

    my $number; if ( ($number) = $date =~ m/([\d,]+)/ ) { $number =~ s/,//g; }
Re: regex needed: Capture a single entity from two discontiguous parts.
by johngg (Abbot) on Dec 24, 2006 at 00:34 UTC
    The best solution is probably to do a match followed by a substitution, as suggested by jettero, BrowserUk and explorer but it can be done with a regular expression, or two in fact. The first uses regular expression recursion to successively match groups of digits followed by a comma or closing parenthesis, appending the captured digits to a variable using a code block. The second initialises the variable then incorporates the first.

    use strict; use warnings; use re q{eval}; my @numbers = ( q{Number... ()}, q{Number... (1)}, q{Number... (12)}, q{Number... (123)}, q{Number... (1,234)}, q{Number... (12,345)}, q{Number... (123,456)}, q{Number... (1,234,567)}, q{Number... (12,345,678)}, q{Number... (123,456,789)}, q{Number... (1,234,567,890)}, q{Number... (1,234a)},); my $deCommafied; my $rxNumberGrps; $rxNumberGrps = qr {(?x) \D* (\d+) (?=,|\)) (?{$deCommafied .= $^N}) (?: (??{$rxNumberGrps}) | \)\z ) }; my $rxDeComma = qr {(?x) (?{$deCommafied = q{}}) $rxNumberGrps }; foreach my $number (@numbers) { print qq{$number - }, $number =~ m{$rxDeComma} ? qq{$deCommafied\n} : qq{no match\n}; }

    Here's the output.

    Number... () - no match Number... (1) - 1 Number... (12) - 12 Number... (123) - 123 Number... (1,234) - 1234 Number... (12,345) - 12345 Number... (123,456) - 123456 Number... (1,234,567) - 1234567 Number... (12,345,678) - 12345678 Number... (123,456,789) - 123456789 Number... (1,234,567,890) - 1234567890 Number... (1,234a) - no match

    As I said, the match and substitute approach is much simpler and easier to understand. This solution is far too complicated to maintain and was done more for the challenge as I'm not very good at recursion.

    Cheers,

    JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://591447]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (11)
As of 2014-07-28 10:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (195 votes), past polls