Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^4: Regex Modification

by Anonymous Monk
on Apr 10, 2013 at 08:42 UTC ( #1027937=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Regex Modification
in thread Regex Modification

All rite I am telling the specification for a match:

1. Number seq. with min 9 and max 15 digits. Like 145-1256-2365-789 2. First 3 and Last 3 should be numbers not diffrentiators. Like 123-1 +545645-123 3. Match can also be without a diffrentiator like 123456789147 4. The diffrentiator(same throughout sequence) can come any number of +times between the first and last 3 digits until the min and max lengt +h criteria is satisfied. Like 1213-456-789-1265

Pls Help...


Comment on Re^4: Regex Modification
Download Code
Re^5: Regex Modification
by AnomalousMonk (Monsignor) on Apr 12, 2013 at 10:09 UTC
    All rite I am telling the specification for a match:

    Full knowledge of a problem if often the first step toward a solution!

    Here's an incomplete solution: incomplete because I feel I should be able to match with strings like
        qw(x123-456789123-456x  x123456-789123456x  x123456789123456x)
    and I can't. In addition, the regex I came up with is quite complicated, probably excessively so.

    Be that as it may, everything else seems to work as intended. The critical portions are the $diff, $d_min and $ndn regexes in the  m1() function. I haven't had time to work on this as I would like, but may do so shortly; it's an interesting problem. Sorry for the delay in getting back to you on this. HTH.

    Output:

      Thnx for the reply.. Can pls explain how the pattern matching is being done?

        Below is an updated version of the regex. It is simplified a little, and an error is corrected. (Update: And it now matches something like 'x123-456789123-456x'.) I am still less than happy with it: it is over-complicated (Update: and it uses package variables), and it is not standalone because of its use of embedded capture groups that make it sensitive to the presence of other capture groups if it is used in combination with other regexes.

        In any event, it works. Please see the embedded comments for a brief explanation of how the regex works, and see perlre and perlretut for more detailed info. The  m1() test function returns the number of matches in a string if called in scalar context, and a list of all the matching sub-strings if called in list context. If you have more questions, please let me know. As before, HTH.

        Code:

        Output:

        Here is a further simplified (and tested) version of the regex. The  $digits and  $diffs package variables are no longer needed, so I'm a little happier with this version, but it still uses absolute capture group numbering and embedded code. I could perhaps use named captures to get around the numbering problem, but I don't see what I can do about the code.

        There are a few more comments that may be helpful, and davido's nice Perl Regular Expression Tester may be enlightening. I may get around to posting a more detailed commentary on the regex in the next couple of days.

        my $ndn = qr{ # cannot begin after digit or any differentiator char (?<! \d) (?<! $diff) # begin potential main pattern capture to group 1 ($d_min # begin group 1 with minimum digits ($diff)? # group 2: possible differentiator char # match to max number of digit(s)/single-diff groups (?: \d+ \g{-1} (?= \d)){0,9} # end group 1 (main pattern) capture with minimum digits $d_min) # end group 1 # main pattern cannot be followed by a digit... (?! \d) # ...or by the diff char, or by any diff char if none present (?(2) (?! \g{-1}) | (?! $diff)) # qualify potential main pattern for min/max digits (?(?{ $1 =~ tr/0-9// > 15 || $1 =~ tr/0-9// < 9 }) (*FAIL)) }xms;

        Update: I finally realized that  $1 in the
            (?(?{ $1 =~ tr/0-9// > 15 || $1 =~ tr/0-9// < 9 }) (*FAIL))
        sub-pattern above can be replaced by  $^N to eliminate one absolute back-reference. Using a named capture group does the trick for the remaining absolute capture, giving the regex below. (However, there may be a speed penalty associated with named captures – but I haven't Benchmark-ed this.)

        my $ndn = qr{ # cannot begin after digit or any differentiator char. (?<! \d) (?<! $diff) # begin potential main pattern capture. ($d_min # begin main pattern group with minimum digits (?<DIFF> $diff)? # group DIFF: possible differentiator char # match to max number of digit(s)/single-diff groups. (?: \d+ \k{DIFF} (?= \d)){0,9} # end main pattern group capture with minimum digits. $d_min) # end main group # main pattern cannot be followed by a digit or... (?! \d) # ... by the diff char if any, else by any diff char. (?(<DIFF>) (?! \k{DIFF}) | (?! $diff)) # qualify potential main pattern for min/max digits. (?(?{ $^N =~ tr/0-9// < 9 || $^N =~ tr/0-9// > 15 }) (*FAIL)) }xms;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1027937]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (12)
As of 2014-07-30 07:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls