Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Regex Help

by gautamparimoo (Beadle)
on Mar 25, 2013 at 07:51 UTC ( #1025247=perlquestion: print w/ replies, xml ) Need Help??
gautamparimoo has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I have the specification to build the following numeric string: 981890or981891 followed by 10 digits or atmost 4 seperators(.,-,:)+ 10 digits. So the example matching this should be: 1. 9818902365894598 2.9818 9021 2454 2170 3.9818-9145-6896-2146 The regex I am trying is as follow:

9818[\D]?9[0|1][\D]?\d{2}[\D]?\d{4}[\D]?\d{4}

But this looks inefficient. Please suggest:

Comment on Regex Help
Download Code
Re: Regex Help
by Anonymous Monk on Mar 25, 2013 at 07:55 UTC

    But this looks inefficient

    Does it work for your porposes?

Re: Regex Help
by hdb (Prior) on Mar 25, 2013 at 08:10 UTC

    If you had no separators, the regex would look like

    /98189(0|1)\d{10}/

    Do you have the option to remove the separators first?

    s/[\s-:,]//g;

    Probably not worth it if your initial regex works satisfactorily.

Re: Regex Help
by davido (Archbishop) on Mar 25, 2013 at 08:28 UTC

    More than looking inefficient, it looks unnecessarily cluttered, which can contribute to bugs:

    • [\D]? is the same as \D?. This is repeated four times.
    • [0|1] is probably a mistake; character classes don't use alternation, and I don't see any | characters in your sample input. You probably mean [01]
    • \D?\d{4} is repeated twice in a row. How about (?:\D?\d{4}){2} ?

    Making those changes would yield:

    9818\D?9[01]\D?\d{2}(?:\D?\d{4}){2}

    Now with the /x modifier, you can further clarify things like this:

    m/ 9818 # A literal. \D? # Optional non-digit. [01] # Require a zero or a one. \D? # Another optional non-digit. \d{2} # Require two digits. (?: # Group but don't capture. \D? # Another optional non-digit. \d{4} # Followed by four digits. ){2} # Repeated twice. /x

    As for efficiency, what problems are you encountering? If you're dealing with huge input you're probably IO bound anyway.


    Dave

      Thnks davido your regex just cleaned it up. But what modifiers or assertion should i use to limit matching such that it does not match it in different lines in a text file ie only match if this pattern is specified in one line not across different lines. Pl tell?

        Replace \D? with a more explicit character class. \D will match anything that is not a numeric digit. Newlines (\n) are included in "anything that is not a numeric digit".


        Dave

Re: Regex Help
by AnomalousMonk (Abbot) on Mar 25, 2013 at 19:19 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1025247]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2014-09-18 02:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (104 votes), past polls