Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

RE question; how to insert a digit between const[a-z]*3 && inc[0-9]*3?

by taint (Chaplain)
on Apr 16, 2013 at 00:47 UTC ( #1028786=perlquestion: print w/ replies, xml ) Need Help??
taint has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, I've been maintaining some 9k html records for a few years now. The names of the files begin with the same 3 alpha characters, followed by 4, or less digits (0-9). I've been maintaining them via quick one-off sed scripts. But recently I decided I needed to re-numerate them, as it would be ultimately be more efficient. I also decided (given I /really/ love perl) that it would make more sense to maintain a set of Perl scripts to better maintain these documents, then it hit me -- I have no idea how to take on the task of re-numerating the references within these documents.
PROBLEM:
changing all references to:
abc[0-9][0-9][0-9].html
into
abc0[0-9][0-9][0-9].html
the same goes for abc[0-9][0-9].html
in short; I need all of the files to be 4 digits long -- not 1,2, & 3, as they are now.
I'm fair with search & replace, add, and delete. But I /suck/ at hold patterns -- which I think I need here. I tried to match against:
 /abc1[0-9][0-9].html
with the intent of matching within the 100-199.html documents. But all my efforts were an epic fail.
Pointers, solutions would be greatly appreciated.

Thank you for all your time, and consideration.

#!/usr/bin/perl -Tw
use perl::always;
my $perl_version = "5.12.4";
print $perl_version;

Comment on RE question; how to insert a digit between const[a-z]*3 && inc[0-9]*3?
Replies are listed 'Best First'.
Re: RE question; how to insert a digit between const[a-z]*3 && inc[0-9]*3? (sprintf)
by tye (Cardinal) on Apr 16, 2013 at 01:10 UTC
    s{(?<=[a-z])([0-9]{1,3})(?=[.])}{ sprintf "%04d", $1 }ge;

    ...with whatever additions are needed near the start and end of that regex in order to only match the parts that you want to.

    - tye        

      Can it not be as simple as

      s{abc(\d+)\.html}{sprintf("abc%04d.html",$1)}ge;

      if all the filenames are of the form abc\d+.html or is there a catch?

        I doubt "abc" was as literal as you interpreted it to be.

        - tye        

      Greetings tye.
      Thanks for taking the time to respond.
      s{(?<=[a-z])([0-9]{1,3})(?=[.])}{ sprintf "%04d", $1 }ge;
      Looks brilliant in it's simplicity. But my head is still in sed(1) scripting mode. So I fear I am not yet following it. :(

      I'm attempting to sort it out now. :)

      Thanks again!

      #!/usr/bin/perl -Tw
      use perl::always;
      my $perl_version = "5.12.4";
      print $perl_version;
Re: RE question; how to insert a digit between const[a-z]*3 && inc[0-9]*3?
by kcott (Abbot) on Apr 16, 2013 at 11:35 UTC

    G'day taint,

    Here's another way to do it:

    $ perl -Mstrict -Mwarnings -E ' my @files = qw{abc1.html abc12.html abc123.html abc1234.html}; say for map { s/^(abc)(\d{1,3})(\.html)$/$1 . "0" x (4 - length $2) . $2 . $ +3/e; $_ } @files; ' abc0001.html abc0012.html abc0123.html abc1234.html

    Update: "... the files begin with the same 3 alpha characters ..." so \w{3} is far too generic: s/\w{3}/abc/

    -- Ken

      Greetings Ken, and thanks for the reply.
      This looks like a complete solution!
      Given that all of the files end in .html, and begin with abc
      I should be able to simply read in the entire directory.
      opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; while (defined($file = readdir(DIR))) { # use your suggestion on "$dirname/$file" } closedir(DIR);
      I'm still sorting it all out. But wanted to take the time to thank you
      for your help.

      Thanks again!

      --chris

      #!/usr/bin/perl -Tw
      use perl::always;
      my $perl_version = "5.12.4";
      print $perl_version;

        Given that context, you might want to try something like this technique (untested):

        while (defined($file = readdir(DIR))) { next unless $file =~ /^(abc)(\d{1,3})(\.html)$/; rename "$dirname/$file" "$dirname/" . $1 . "0" x (4 - length $2) . + $2 . $3; }

        I'd also recommend that you try this with a print to ensure you're in the right directory and targetting the correct files before making thousands of changes; then change to rename. Some error checking/handling might also be useful.

        -- Ken

Re: RE question; how to insert a digit between const[a-z]*3 && inc[0-9]*3?
by MidLifeXis (Monsignor) on Apr 16, 2013 at 13:36 UTC

    Depending on your server software, you could also configure your server to generate a 301 response for the old document names. I have done this in the past on apache with mod_rewrite. It could also be done with a mod_perl handler or even a dispatcher if you drive everything through a dispatch script.

    --MidLifeXis

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1028786]
Approved by NetWallah
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2015-08-01 02:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (285 votes), past polls