Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

List-to-Range generation

by japhy (Canon)
on Jun 11, 2001 at 19:35 UTC ( [id://87538]=CUFP: print w/replies, xml ) Need Help??

Converts a list of numbers to a string of ranges: (1,2,3,5,8,9,10,11) to "1-3,5,8-11".
use 5.6.0; # for (??{ ... }) sub num2range { local $_ = join ',' => sort { $a <=> $b } @_; s/(?<!\d)(\d+)(?:,((??{$++1})))+(?!\d)/$1-$+/g; return $_; }

Replies are listed 'Best First'.
Re: List-to-Range generation
by chipmunk (Parson) on Jun 12, 2001 at 02:55 UTC
    Woops, it also converts (11,2,3) to "11-3" and (1,2,30) to "1-30". :)

    Here's one way to fix it:

    sub num2range { local $_ = join ',' => @_; s/(?<!\d)(\d+)(?:,((??{$++1}))(?!\d))+/$1-$+/g; return $_; }
    The negative look-behind at the beginning and the negative look-ahead near the end prevent the regex from matching only part of a number, like the second 1 in 11 or the 3 in 30.
      D'oh. Thanks. Believe it or not, I use that look-behind and look-ahead method in another regex, to find a number in a string less than a given number.

      japhy -- Perl and Regex Hacker
      Thanks for the code. Is there a mod to the code to convert an array with padded numbers. I like to convert (0001,0002,0003,011,012,013,015) to "0001-0003,011-013,015". Thanks a lot.
Re: List-to-Range generation
by ZZamboni (Curate) on Jun 11, 2001 at 21:04 UTC
    OK, it took me a few minutes to completely understand how and why this works. Here is my dissected version of the regex:
    s/(\d+) # first number (group #1) (?: # group #2 , # followed by a comma ( # group #3 (??{$++1}) # match previous number + 1 (group 4) ) # end group #3 )+ # end group #4, repeat /$1-$+/gx; # substitute for the first number followed by +the # last matched one
    Group #1 matches the first number in a sequence of numbers. Then, the ??{$+ + 1} is used to match "the last number plus one" ($+ stands for whatever was matched by the last set of grouping parenthesis). For the second number in a sequence, the "last number" is the one matched by group #1. But for subsequent numbers (because of the +), the last number matched (this is, whatever the ??{$++1} matched last time) becomes the "last number". So the thing repeats until the "last number plus one" part doesn't match anymore (this is, until a non-consecutive number is found), and then replaces the whole thing with the first number (group #1), a dash, and the last number matched.

    At first look, I thought the double parenthesis around ??{$++1} were unnecessary, but without them it does not work, and here is why: $+ contains what was matched by the last set of parenthesis, not the current set. So by doubling the parenthesis, it makes $+ contain the last thing matched by the current expression. Very clever!


      m{ (\d+) # \1 start -- digits -- \1 end (?: , # , ( # \2 start (??{$++1}) # evaluate '$+ + 1' as a regex )+ # \2 end (and try again) ) }
      The $+ refers to the last successful captured pattern, and that capture must have been closed. So the first time the (??{...}) is reached, $+ is $1's value. The next time, it's $2's (first) value, and then $2's new value, and so on.

      japhy -- Perl and Regex Hacker
      Any chance to structure it so to list padded numbers. I like to convert (0001,0002,0003,011,012,013,015) to "0001-0003,011-013,015". Thanks a lot.

        Our venerable learned brother ZZamboni graces our humble monastery with his esteemed presence only infrequently these days. He last visited some 18 months ago, so you might be waiting a while for a direct reply.

        Other interested parties might wish to know that rmocster subsequently posted his own SoPW question (Convert an array of numbers into a range). You might wish therefore to follow that thread to see not only the context but the ensuing discussion.

Re: List-to-Range generation
by Vynce (Friar) on Sep 21, 2001 at 23:59 UTC

    by request in the chatterbox, a simple undo:

    sub stringRangeToList { my $foo = shift; $foo =~ s/-/../g; return eval $foo; }

    or, if you want string-to-string, this maybe seems more perlish:

    sub stringRangeToStringList { my $foo = shift; $foo =~ s/(\d+)-(\d+)/join ',',$1..$2/eg; return $foo; }

    the usual caveat about 0-led numbers applies to the first and, in fact, it can take hex values (though it returns them as ints, which means they are likely to print as decimal if you don't pay attention). for reasons i'm not clear on, the 0-led thing doesn't happen to the second, though that's nicely consistent with its willful mistreatment of hex numbers. in fact, with no strict ('subs'), the first can take letter-ranges, as well.

    neither responds particularly surprisingly, nor particularly well, to "3-5-7". error checking is left as an exercise to the reader. use at your own risk. these snippets were not tested on live animals; no test data was harmed in the design, creation, or testing of these snippets.

Re: List-to-Range generation
by ZZamboni (Curate) on Jun 11, 2001 at 20:31 UTC
    I wish I could double-++ this. It is fantastic!

    I needed exactly this a couple of days ago, and I ended up using Set::IntSpan. Oh well... :-)

    /me bows reverently to master regexer japhy


      Thanks muchly. I'm working on one to change (4,5,6,11,12,13,14,19,20,21) to "4-6,11-4,19-21"

      japhy -- Perl and Regex Hacker
        I assume you meant "4-6,11-14,19-21"? If so, that's what this one does, isn't it? What am I missing?


Re: List-to-Range generation
by $code or die (Deacon) on Jul 05, 2001 at 18:40 UTC
    This is nice.++

    I like to add a sort {$a <=> $b} @_; so it doesn't matter what order you pass the list in.

    Error: Keyboard not attached. Press F1 to continue.
Re: List-to-Range generation
by abjr (Initiate) on Aug 23, 2011 at 20:07 UTC

    10 years later this is still really cool. I've been using if for a while now and there is one small issue I've seen. If you have a list of numbers, say 1,2,3 ... 39999,40000 and run it through num2range, you'll get the following back: "1-32768,32769-40000" instead of "1-40000".

    I'm guessing this is because you can't have more than 32k captures in a regex?

      Hmm, just trying to figure out a way to test this is bending my mind. So is this true? This method only works for ranges up to 2^15?

      What if you split up the ranges? It would work for any list of up to 32,768 consecutive numbers, right?

Re: List-to-Range generation
by Anonymous Monk on Apr 16, 2013 at 20:10 UTC
    Thanks for the code japhy. Very helpful. -VM

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://87538]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (2)
As of 2024-04-21 12:23 GMT
Find Nodes?
    Voting Booth?

    No recent polls found