http://www.perlmonks.org?node_id=904856

jpl has asked for the wisdom of the Perl Monks concerning the following question:

I'm rebuilding a house, thereby insuring that I will honor my vows of poverty. We have a "plantcam" set up, taking a picture every 30 seconds, so I can make animated movies of my money disappearing. Individual pictures have names of the form

WSPC2420.JPG

with the final four digits ranging from 0001 through 9999 (if the battery and SD card last that long). There are long sequences of pictures where nothing of interest happens, punctuated by sequences worth animating. So I make notes like

427-2301 chimney goes up

2622-2630 chimney falls down

I want to turn these numerical ranges into glob patterns so that (just) the right pictures are passed to the animation command. The second range, for example, would turn into

262[2-9] 2630
I started out constructing the globs by hand, but that loses its charm quickly, and is prone to error. So I figured it should be a simple perl script. I have a working script (below), but I am disappointed at how baroque it is. It seems that with so much structure to both ranges and globs, a beautiful solution should exist.

I am aware, by the way, that what the animation command sees as arguments is the expanded list of individual file names, so I'm not saving it any work by having the shell expand a glob versus simply itemizing the names (which would be trivial, but intensely ugly for long sequences). And, besides, it should be an interesting problem upon which to meditate.

#!/usr/bin/perl -w use strict; sub r2g { my $lower = shift; my $upper = shift; my $maxlen = length($upper); $maxlen = length($lower) if ( length($lower) > $maxlen ); my @globs; my $format = "%0${maxlen}d"; my $maxn = $lower; # Take care of two fringe cases return [ sprintf( $format, $maxn ) ] if ( $maxn == $upper ); return [ '?' x $maxlen ] if ( ( $maxn == 0 ) && ( $upper == '9' x $maxlen ) ); PATTERN: # $maxn is the least number in the range $lower through $upper # that isn't already matched by some glob pattern. # We can only put a (non-trivial) glob pattern in a position # if all the positions to the right are already globbed with '?'. # The largest number matched by such a glob ends with 1 or more 9' +s. while ( $maxn <= $upper ) { for ( my $i = $maxlen - 1 ; $i >= 0 ; --$i ) { my $pattern = $maxn = sprintf( $format, $maxn ); for ( my $j = $i + 1 ; $j < $maxlen ; ++$j ) { substr( $pattern, $j, 1 ) = '?'; } my $c = substr( $maxn, $i, 1 ); if ( $c ne '0' ) { # We cannot glob further left # See if we can glob starting here foreach my $d ( reverse( $c .. '9' ) ) { substr( $maxn, $i, 1 ) = $d; if ( $maxn <= $upper ) { substr( $pattern, $i, 1 ) = ( $d eq $c ) ? $d : "[$c-$d]"; substr( $maxn, $i, 1 ) = $d; push( @globs, $pattern ); ++$maxn; next PATTERN; } } } else { # We can glob with a ? here if replacing the digit # with a 9 still remains less than or equal to $upper foreach my $d ( reverse( '0' .. '9' ) ) { substr( $maxn, $i, 1 ) = $d; if ( $maxn <= $upper ) { if ( $d ne '9' ) { # Couldn't do a ? match. # Glob what we can, increment $maxn # and look for a new pattern substr( $pattern, $i, 1 ) = ( $d eq '0' ) ? '0' : "[0-$d]"; push( @globs, $pattern ); ++$maxn; next PATTERN; } last PATTERN if ( $i == 0 ); last; } } } } } return \@globs; } # my $g = r2g("0123", "1243"); # print("@$g\n"); # $g = r2g("0000", "1243"); # print("@$g\n"); # $g = r2g("0001", "9999"); # print("@$g\n"); # $g = r2g("0000", "9999"); # print("@$g\n"); # $g = r2g("0000", "0999"); # print("@$g\n"); # $g = r2g("9000", "9999"); # print("@$g\n"); while ( my $line = <> ) { if ( $line =~ /([0-9]+)[^0-9]+([0-9]+)/ ) { my $g = r2g( $1, $2 ); print("$1-$2: @$g\n"); } }

Replies are listed 'Best First'.
Re: ranges to globs - OT
by ww (Archbishop) on May 14, 2011 at 21:19 UTC
    mea culpa; At 1 frame per 30 sec, 9999 comes to a little short of 300K seconds...
    or 5,000 hours...
    roughly, 7 months.

    That's not outlandish (in many climates) but one might hope that your plans include moving the camera inside when the exterior activity wanes.

    And did you check the contractor's reputation for timely completion (see also "chimney falls down")?

    But welcome to 'the American dream.'

    Update:

    Stricken statements represent some sort of gargantuan brain-spasm. For better logic and accurate calculation, see OP's response.
      A frame every 30 seconds is 2 frames per minute, 120 frames per hour, more or less 1200 frames in a ten-hour day. So we go through a card per week (and during the winter, the battery usually died, after which the camera never came back on the next day).

      I'm joking about the chimney coming down. We had a rough winter, and a very wet spring, here in the Northeast USA, so we are behind schedule by a couple months. But we're lucky to have another house to live in, so we're in no particular rush. We're looking forward to moving into a very "green" house by fall.

      We'll need the contractor's help to install the camera indoors when the sheetrocking goes on, although that would make great footage. I expect they'll knock off a wall in a couple hours, so if they don't move the camera to follow the action, we won't see much indoors.

Re: ranges to globs
by Anonymous Monk on May 14, 2011 at 19:04 UTC
    FWIW, don't you want "[0-9]" instead of "?"? :)
      In the context I'm dealing with, there will be no non-digits that could match, and ? chews up less space than [0-9]. But given a solution that assumes ?, it's easy enough to convert it to one using [0-9], where non-digits might appear. I doubt that the elegance (or lack thereof) of the code will be influenced much by the choice.
Re: ranges to globs
by repellent (Priest) on May 20, 2011 at 20:11 UTC