Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Problem with regular expression

by MARVion (Novice)
on Feb 12, 2013 at 14:39 UTC ( #1018379=perlquestion: print w/replies, xml ) Need Help??
MARVion has asked for the wisdom of the Perl Monks concerning the following question:

Hi @ all,
I have a problem using a regular expression against some values. In most cases it works fine, but I have to skip some values, which isn't working. See, I have this array with different possible combinations:

$in[0]="TFS100"; $in[1]="TFS 100"; $in[2]="TFS-CR100"; $in[3]="TFS-CR 100"; $in[4]="TFS_100"; $in[5]="TFS ID 100"; $in[6]="TFS CR 100"; $in[7]="TFS ID100"; $in[8]="TFS-ID 100"; $in[9]="TFS ID:100"; $in[10]="TFS-ID: 100"; $in[11]="- TFS CR634: STRESS: H 17326,21600,"; $in[12]="CR0080588"; $in[13]="TFS0080588";
and i check with this regular expression:
$pattern = '[Tt]?[Ff]?[Ss]?[-_\s]?[Cc]?[Rr]?[\']?[Ii]?[Dd]?:?\s?(\d+)' +;
to also parse the numbers as I need them later in my code. This works fine for nearly all cases, but I have to "skip" or return false for the both last cases ([12] and [13]), if after some characters like "cr" or "tfs" the next value is a zero or double zero. Interesting is, that the number in [12] isn't recognized, but it is in [13]. I tried a lot of things but couldn't get it to work.
Maybe you can help me fixing my problem? This would be great!

Greetz from Germany
Yours sincerely


Replies are listed 'Best First'.
Re: Problem with regular expression
by Utilitarian (Vicar) on Feb 12, 2013 at 16:01 UTC
    Your criteria are a little under specified, however the following works, for what you've asked.
    #!/usr/bin./perl use strict; use warnings; my @in=("TFS100","TFS 100","TFS-CR100", "TFS-CR 100","TFS_100","TFS ID 100", "TFS CR 100","TFS ID100","TFS-ID 100", "TFS ID:100","TFS-ID: 100", "- TFS CR634: STRESS: H 17326,21600,", "CR0080588","TFS0080588" ); for (@in){ print "$1\n" if /TFS[-_\s]?(?:CR|ID)?\s?([1-9]\d+)/; }
    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
      Your criteria are a little under specified...

      Amen to that, brother!

Re: Problem with regular expression
by AnomalousMonk (Chancellor) on Feb 12, 2013 at 17:03 UTC
    In most cases it works fine...

    Further to choroba's post: MARVion: Your regex doesn't really work at all because almost everything in it is optional (i.e., has the  ? quantifier) except the  (\d+) bit at the end, so it is happy to match with and capture the first group of decimal digits in anything. What are you really after here?

    >perl -wMstrict -le "my @in = ( 'TFS100', 'TFS 100', 'TFS-CR100', 'TFS-CR 100', 'TFS_100', 'TFS ID 100', 'TFS CR 100', 'TFS ID100', 'TFS-ID 100', 'TFS ID:100', 'TFS-ID: 100', '- TFS CR634: STRESS: H 17326,21600,', 'CR0080588', 'TFS0080588', qw(1234 xxx1234 123xxx xxx123xxx 123xxx456 xxx123xxx456), ); ;; for my $s (@in) { $s =~ m{ [Tt]? [Ff]? [Ss]? [-_\s]? [Cc]? [Rr]? [Ii]? [Dd]? :? \s? (\d+) }xms; print qq{'$1' <- '$s'}; } " '100' <- 'TFS100' '100' <- 'TFS 100' '100' <- 'TFS-CR100' '100' <- 'TFS-CR 100' '100' <- 'TFS_100' '100' <- 'TFS ID 100' '100' <- 'TFS CR 100' '100' <- 'TFS ID100' '100' <- 'TFS-ID 100' '100' <- 'TFS ID:100' '100' <- 'TFS-ID: 100' '634' <- '- TFS CR634: STRESS: H 17326,21600,' '0080588' <- 'CR0080588' '0080588' <- 'TFS0080588' '1234' <- '1234' '1234' <- 'xxx1234' '123' <- '123xxx' '123' <- 'xxx123xxx' '123' <- '123xxx456' '123' <- 'xxx123xxx456'
Re: Problem with regular expression
by choroba (Bishop) on Feb 12, 2013 at 15:12 UTC
    Why cannot you use plain /([0-9]+)/?
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Problem with regular expression
by MARVion (Novice) on Feb 13, 2013 at 09:38 UTC

    Hi again,

    well, with a litte change Utilitarian's Code works perfectly for me.


    However, I would appreciate it, if you could explain this section to me, as I'm not that familiar with regular expressions but for sure I want to learn to use them!




      oops, I missed the one with a colon :(
      (?: # begin non capturing group CR|ID # alternation accepts either pattern group 'CR' or 'ID' )? # group is optional

      print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
Re: Problem with regular expression
by sundialsvc4 (Abbot) on Feb 12, 2013 at 19:25 UTC

    I would apply one or more regular expressions to identify whether-or-not this a string that you are interested in.   Then, write another regex to match only what you need from each.   Use a subroutine or subroutines; “regex golf” only goes so far.   Treat the two problems separately, and break each one down into as many cases as you need.

    Obviously, entry #11 is the odd-man-out from all the rest.   You need to build a suite of examples of everything that the string could possibly be, and determine exactly what should be used to extract the data-of-interest from each one.   Then, I strongly recommend putting the whole thing to the Test::Most.   Write a test suite that proves the correct operation of the program.   Also, write the program itself so that it is suspicious of all its inputs and will, say, die() if it encounters anything that it cannot handle.   (Otherwise, you have no practical way to realize that a problem exists, either in the data or in the program or both.   The fact that the program ran cleanly should be a strong indicator that both the data and the software are correct.)

Re: Problem with regular expression (sub)
by Anonymous Monk on Feb 12, 2013 at 14:48 UTC
    Write a function (sub) to perform checks, then you're not limited by a fat regular expression you don't know how to write :)
Re: Problem with regular expression
by MARVion (Novice) on Feb 13, 2013 at 09:20 UTC


    thanks for all your help! I'm sorry that I'm not precise enough in describing my problem!
    Well, I will try your suggestions and solutions and if I can fix my problem I will tell you so that this thread has a solution to it's problem!
    Thank you for all your replies!



Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1018379]
Approved by Corion
Front-paged by Arunbear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2018-07-23 12:07 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (465 votes). Check out past polls.