http://www.perlmonks.org?node_id=1028786

taint has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, I've been maintaining some 9k html records for a few years now. The names of the files begin with the same 3 alpha characters, followed by 4, or less digits (0-9). I've been maintaining them via quick one-off sed scripts. But recently I decided I needed to re-numerate them, as it would be ultimately be more efficient. I also decided (given I /really/ love perl) that it would make more sense to maintain a set of Perl scripts to better maintain these documents, then it hit me -- I have no idea how to take on the task of re-numerating the references within these documents.
PROBLEM:
changing all references to:
abc[0-9][0-9][0-9].html
into
abc0[0-9][0-9][0-9].html
the same goes for abc[0-9][0-9].html
in short; I need all of the files to be 4 digits long -- not 1,2, & 3, as they are now.
I'm fair with search & replace, add, and delete. But I /suck/ at hold patterns -- which I think I need here. I tried to match against:
 /abc1[0-9][0-9].html
with the intent of matching within the 100-199.html documents. But all my efforts were an epic fail.
Pointers, solutions would be greatly appreciated.

Thank you for all your time, and consideration.

#!/usr/bin/perl -Tw
use perl::always;
my $perl_version = "5.12.4";
print $perl_version;
  • Comment on RE question; how to insert a digit between const[a-z]*3 && inc[0-9]*3?

Replies are listed 'Best First'.
Re: RE question; how to insert a digit between const[a-z]*3 && inc[0-9]*3? (sprintf)
by tye (Sage) on Apr 16, 2013 at 01:10 UTC
    s{(?<=[a-z])([0-9]{1,3})(?=[.])}{ sprintf "%04d", $1 }ge;

    ...with whatever additions are needed near the start and end of that regex in order to only match the parts that you want to.

    - tye        

      Can it not be as simple as

      s{abc(\d+)\.html}{sprintf("abc%04d.html",$1)}ge;

      if all the filenames are of the form abc\d+.html or is there a catch?

        I doubt "abc" was as literal as you interpreted it to be.

        - tye        

      Greetings tye.
      Thanks for taking the time to respond.
      s{(?<=[a-z])([0-9]{1,3})(?=[.])}{ sprintf "%04d", $1 }ge;
      Looks brilliant in it's simplicity. But my head is still in sed(1) scripting mode. So I fear I am not yet following it. :(

      I'm attempting to sort it out now. :)

      Thanks again!

      #!/usr/bin/perl -Tw
      use perl::always;
      my $perl_version = "5.12.4";
      print $perl_version;
Re: RE question; how to insert a digit between const[a-z]*3 && inc[0-9]*3?
by kcott (Archbishop) on Apr 16, 2013 at 11:35 UTC

    G'day taint,

    Here's another way to do it:

    $ perl -Mstrict -Mwarnings -E ' my @files = qw{abc1.html abc12.html abc123.html abc1234.html}; say for map { s/^(abc)(\d{1,3})(\.html)$/$1 . "0" x (4 - length $2) . $2 . $ +3/e; $_ } @files; ' abc0001.html abc0012.html abc0123.html abc1234.html

    Update: "... the files begin with the same 3 alpha characters ..." so \w{3} is far too generic: s/\w{3}/abc/

    -- Ken

      Greetings Ken, and thanks for the reply.
      This looks like a complete solution!
      Given that all of the files end in .html, and begin with abc
      I should be able to simply read in the entire directory.
      opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; while (defined($file = readdir(DIR))) { # use your suggestion on "$dirname/$file" } closedir(DIR);
      I'm still sorting it all out. But wanted to take the time to thank you
      for your help.

      Thanks again!

      --chris

      #!/usr/bin/perl -Tw
      use perl::always;
      my $perl_version = "5.12.4";
      print $perl_version;

        Given that context, you might want to try something like this technique (untested):

        while (defined($file = readdir(DIR))) { next unless $file =~ /^(abc)(\d{1,3})(\.html)$/; rename "$dirname/$file" "$dirname/" . $1 . "0" x (4 - length $2) . + $2 . $3; }

        I'd also recommend that you try this with a print to ensure you're in the right directory and targetting the correct files before making thousands of changes; then change to rename. Some error checking/handling might also be useful.

        -- Ken

Re: RE question; how to insert a digit between const[a-z]*3 && inc[0-9]*3?
by MidLifeXis (Monsignor) on Apr 16, 2013 at 13:36 UTC

    Depending on your server software, you could also configure your server to generate a 301 response for the old document names. I have done this in the past on apache with mod_rewrite. It could also be done with a mod_perl handler or even a dispatcher if you drive everything through a dispatch script.

    --MidLifeXis