http://www.perlmonks.org?node_id=1096228

viffer has asked for the wisdom of the Perl Monks concerning the following question:

Hi all.
I'm hoping someone can point out the error of my ways... I'm sure there is a simple regex for this but I'm buggered if I can figure it out.

I am trying to match a field that may have a number of leading spaces, followed by a number of digits, but can't have any spaces after a digit has been found. It must contain at least one digit at the end,
i.e. valid data may be
'     999' or '   9999' or '9999999' '      9'.

but can't be

'  99  9'.

I have a regex which works

\S{7}|\s{6}\d{1}|\s{5}\d{2}|\s{4}\d{3}|\s{3}\d{4}|\s{2}\d{5}|\s{1}\d{6 +}|\d{7}
but when you're checking a 16 byte filed

ZZZZZZZZZZZZ9V99

a regex of

(\s{12}\d{1}|\s{11}\d{2}|\s{10}\d{3}|\s{9}\d{4}|\s{8}\d{5}|\s{7}\d{6}| +\s{6}\d{7}|\s{5}\d{8}|\s{4}\d{9}|\s{3}\d {10}|\s{2}\d{11}|\s{1}\d{12}|\d{13})\.\d{2}
whilst doing the job,is getting ludicrously large and unreadable

There must be a shorter regex that covers this?

Someone at work suggested using an sprintf in it's stead within the regex, but I must be honest and say that suggestion has left me clueless on how to do it.

Thanks for your time

Replies are listed 'Best First'.
Re: Should be a simple spaces/digits regex....but I'm turning grey!
by Bethany (Scribe) on Aug 05, 2014 at 02:11 UTC

    If only digits and space characters are allowed in the string, forget the (numeric) quantifiers and just check for the conditions you described; zero or more spaces followed by one or more digits; anchored fore and aft (at the beginning and end of the string) so nothing can come before the optional spaces and nothing after the digits:

    ^\s*\d+$

    If characters other than digits and spaces may appear, I'd go for a coded solution rather than trying to concoct a single regex to do it all.

    (Edited to add "numeric")

Re: Should be a simple spaces/digits regex....but I'm turning grey! (?=)
by tye (Sage) on Aug 05, 2014 at 02:14 UTC
    # n-1 spaces or digits: vv vvvvv: last digit (?:(?=\s[\s0-9]|[0-9]{2})[\s0-9]){15}[0-9] # (^^^^^^^^^^^^^^^^) ^^^^^^^: leading chars

    and only allow \s\s, \s\d, or \d\d at each point along the way.

    Note that I always use [0-9] and never \d as, these days, \d includes tons of characters besides '0'..'9'.

    - tye        

      Note that I always use 0-9 and never \d as, these days, \d includes tons of characters besides '0'..'9'.
      That's news to me. Something to read up on, update rusty skills.

        Yeah, if you've got a version of Perl that supports such. Way too many versions of Perl after \d began including Klingon* digits yet before /a was implemented.

        Plus, /a messes with more than just \d. I have yet to run into a single project I was involved in where a string of Klingon* digits would be correctly parsed as a numeric value. But I've touched plenty of projects where \w including more letters than a-z was quite useful. Perl itself is that way, after all. Sure, you could write (?a:\d) but that's just longer and less clear (and less portable).

        So I suspect I'll be sticking with [0-9] for quite a while still.

        * No, Unicode doesn't actually include Klingon (yet, anyway).

        - tye        

Re: Should be a simple spaces/digits regex....but I'm turning grey!
by Athanasius (Archbishop) on Aug 05, 2014 at 02:28 UTC

    Hello viffer,

    As others have noted, your question is somewhat under-specified. Here is my take on what you may be looking for:

    #! perl use strict; use warnings; my $len = 7; for my $s (' 999', ' 9999', '9999999', ' 9', ' 99 9', ' +9') { if (length $s == $len && $s =~ /^\s*\d+$/) { printf "%-*s matches\n", $len + 2, "|$s|"; } else { printf "%-*s does not match\n", $len + 2, "|$s|"; } }

    Output:

    12:33 >perl 960_SoPW.pl | 999| matches | 9999| matches |9999999| matches | 9| matches | 99 9| does not match | 9| does not match 12:33 >

    When the size of the field changes, no need for a new regex — just change the value of $len.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Should be a simple spaces/digits regex....but I'm turning grey!
by Anonymous Monk on Aug 05, 2014 at 02:15 UTC

    ' 999' or ' 9999' or '9999999' ' 9'.

    Is "field" length 7 or 8 chars? What is a "field" (part of a larger string)?

    but when you're checking a 16 byte filed ZZZZZZZZZZZZ9V99

    Um, there is no description of what you want for that one :) one thing at a time?

    whilst doing the job,is getting ludicrously large and unreadable

    Instead of one regex, write twelve thirteen??

    Stop writing unreadable regex :) Write beautiful regex, not ugly, so you can read :) How can I hope to use regular expressions without creating illegible and unmaintainable code?

    Someone at work suggested using an sprintf in it's stead within the regex, but I must be honest and say that suggestion has left me clueless on how to do it.

    Are you trying to format a string? If you are, go ahead and use sprintf, otherwise ...

    write a function?

    Hope this helps

      ZZZZZZZZZZZZ9V99 is just a COBOL field definition, meaning if the leading digits are 0, don't print them,
      hence a value of 9.99 with a definition of ZZ9V99 would print as 9.99,
      a definition of 999V99 would result in 009.99 being shown
Re: Should be a simple spaces/digits regex....but I'm turning grey!
by Laurent_R (Canon) on Aug 05, 2014 at 06:40 UTC
    From your examples and code fragment as much as from your description, it seems that you are looking for a 7-character string made of spaces followed by digits. Why not simply this:
    $num = $1 if length($string) == 7 and $string =~ /^\s*(\d+)$/;
      Thanks all

      /^\s*\d+$/

      I'm officially an idiot :)

        I'm officially an idiot :)

        Heavens, no! An idiot is a person who doesn't ask for help. Glad it works for you.

Re: Should be a simple spaces/digits regex....but I'm turning grey!
by BillKSmith (Monsignor) on Aug 05, 2014 at 13:26 UTC
    This passes all your test cases:
    use strict; use warnings; use Test::Simple qw(no_plan); my %test_cases = ( ' 999' => 'valid', ' 9999' => 'valid', '999999' => 'valid', ' 9' => 'valid', ' 99 9' => 'invalid', ); foreach my $case (keys %test_cases) { my $does_match = $case =~ / ^\s* # any number of leading spaces \d+ # followed by a number of digits (:? [^\s]* # but cant have any spaces after a digit has b +een found \d+ # It must contain at least one digit at the en +d )?$ /x ; ok( !($does_match xor $test_cases{$case} eq 'valid'), "'$case': $test_cases{$case}" ); }
    OUTPUT:
    ok 1 - '999999': valid ok 2 - ' 999': valid ok 3 - ' 9': valid ok 4 - ' 9999': valid ok 5 - ' 99 9': invalid 1..5
    Bill

      But that matches " 999foo9", which isn't valid. Also viffer already gave the solution.

        Yes, viffer's solution and mine disagree on the validity of your string. I believe that mine is the one that meets the original specification. In hindsight, it seems that his solution probably does implement what he intendeds.
        Bill
Re: Should be a simple spaces/digits regex....but I'm turning grey!
by Anonymous Monk on Aug 05, 2014 at 12:57 UTC
    Tasks like this can also be flummoxed by greed. The greed of a regular expression, that is, not the Deadly Sin. Left to its own devisings, a regex will normally consume the longest substring that qualifies. It must be told not to be so greedy; to take the shortest one instead. Probably not applicable to this case but worth keeping in mind.