Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Renaming an image file

by Anonymous Monk
on Nov 28, 2010 at 05:28 UTC ( #874068=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi there Monks!
I need to rename my image files that has this format:
joe_IMG_27445.JPG or Mary_34555.jpg
I need to format these files with a random unique number right after the first "_", at the end I will have all the files like:
joe_34455667.jpg or Mary_8883377.jpg etc...
This way the files will be unique and uniform for my storage needs. Is this possible, any help on how I could accomplish this?
$image_name=~s/\w+_\.*?\.\w{3}/\w+_"random number here"\.\w{3}/g; # I +am stuck
Thanks for the help!

Comment on Renaming an image file
Download Code
Re: Renaming an image file
by ikegami (Pope) on Nov 28, 2010 at 06:03 UTC
    You want executable code, so you need to use /e.
    s/^([^_]+_).*(\.\w{3})\z/$1 . sprintf("%08d", rand(100000000)) . $2/eg +;
    or
    s/^([^_]+_).*(?=\.\w{3}\z)/$1 . sprintf("%08d", rand(100000000))/eg;
    or (5.10 required)
    s/^[^_]+_\K.*(?=\.\w{3}\z)/sprintf("%08d", rand(100000000))/eg;

    This way the files will be unique and uniform for my storage needs.

    Random numbers aren't unique. They're random.

Re: Renaming an image file
by BrowserUk (Pope) on Nov 28, 2010 at 06:26 UTC

    Rather than a random number, I'd suggest using a digest of the files contents. Digest::MD5 for example. Whilst not guaranteed to be unique, the chance of collision is remote.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I would use a digest not of the file content, but of the file name (computationally cheaper and no collisions possible).

      If you are on a unix-filesystem I would use the inode-number of the file (you get that with stat).

        I would use a digest not of the file content, but of the file name (computationally cheaper and no collisions possible).

        What makes you think that a filename has the special property of returning a unique value for each and every hash function? Hash functions always have collisions, by definition. You can't losslessly compress arbitary amounts of data into 128, 256 or 512 bits. Sure, it is unlikely that two short names share the same hash value, but it is not impossible. And with filenames near MAX_PATH, which is 4 KBytes on Linux and perhaps even larger on other systems, collisions become more likely.

        If you are on a unix-filesystem I would use the inode-number of the file (you get that with stat).

        The inode is not unique, it is just unique per filesystem. Together with the device number, it should be unique. Problems start when the filesystem layer of the OS kernel has to invent inode numbers for filesystems that do not have inodes. Linux does that for at least the FAT-based filesystems, in linux/fs/fat/inode.c.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        Not withstanding afoken's very good points above, the OP is combining his random number with part of the original filename.

        I was suggesting he do the same with the MD5, which covers the possibility of there being two copies of the same content under different names in the same directory.

        I would also combine the file length into the derived number, as probabilistically, each MD5 will repeat once in each set of files of length modulus 16.

      How to use Digest::MD5 in this file name rename code? If joe_34455667.jpg or Mary_8883377.jpg etc...
      ... my $file=~s/^([^_]+_)(.*)(\.\w{3})\z/$1.md5_hex($2).$3/eg; ...

        Your code is replacing the number with the md5 of that number, which is a poor idea.

        My suggestion was that you replace the number with the md5 of the files contents. Something like:

        my $file = 'joe_34455667.jpg'; my $md5 = md5_hex( do{ local( @ARGV, $/) = $file; <> } ); $file =~ s[_(\d+)\.jpg$][_$md5.jpg];

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Renaming an image file
by 7stud (Deacon) on Nov 28, 2010 at 07:11 UTC

    Here's a non-obfuscated version:

    use strict; use warnings; use 5.010; use List::Util qw{ shuffle }; my @numbers = (1 .. 100_000); my @rand_numbers = shuffle @numbers; my @fnames = qw{ joe_IMG_27445.JPG Mary_34555.jpg }; for my $fname (@fnames) { my $rand_number = shift @rand_numbers; my @pieces = split /_/, $fname, 2; my $new_fname = "$pieces[0]_$rand_number.jpg"; say $new_fname; } --output:-- joe_18119.jpg Mary_46301.jpg
Re: Renaming an image file
by Anonymous Monk on Nov 28, 2010 at 10:04 UTC
    Hi,

    I know a 'random-unique' number was requested and lots of good suggestions have been made, but just in case the unique bit is all that is necessary...

    Store a number in a db, a flat file would do, and increment it for each file renamed.

    This should guarantee uniqeness.

    J.C.

Re: Renaming an image file
by mjscott2702 (Pilgrim) on Nov 28, 2010 at 10:17 UTC
    Why not avoid all the potential (though unlikely) issues of non-unique pseudo-random numbers and hash collisions - just use the start time of the script ($^T) as a base index, and increment from there. And if you restart the script, it won't collide with any names from before.
      ... use the start time of the script ($^T) as a base index, and increment from there. And if you restart the script, it won't collide with any names from before

      How much would you like to bet on that idea?

      $^T has second resolution. Start the script at some point T in time. $filename=T at start of script. Process 100 files in 60 seconds, with $filename++ for each file. $filename=T+100 at end of script. Start again for the next set of files at T+65 seconds. First $filename=$^T=T+65. Instant collision.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        easy to fix :)
        $^T . rand($^T) . $$ $^T . int( rand($^T) ) . $$
        heck you could even append
        crypt $^T.$$.$^T, rand($^T)
        OK, flaw in my logic there - which you could have simply pointed out without the challenge to a bet. Hope you feel superior now.

        The point of my original post is that there may be a sufficiently simple way of doing it, without resorting to databases or with the caveats associated with random numbers and hashing.

        Maybe the Time::HiRes module would be an option - microsecond resolution, if available, should be enough:

        use strict; use warnings; use Time::HiRes qw(gettimeofday); my($seconds, $microseconds); my $index; for (1..10) { ($seconds, $microseconds) = gettimeofday; $index = sprintf("%d%06d", $seconds, $microseconds); print "$index\n"; }

        Output:

        1290951829553400 1290951829553437 1290951829553448 1290951829553457 1290951829553467 1290951829553477 1290951829553487 1290951829553496 1290951829553505 1290951829553515
Re: Renaming an image file
by aquarium (Curate) on Nov 28, 2010 at 22:21 UTC
    does the number absolutely need to be random..or just unique? 00000001 consecutively increasing to 99999999 works for me as "unique" identifiers. you can always pick up where you left off, finding largest current number in filenames in the directory. you could always append a pseudo-random number after the initial sequence number, to prevent easy guessing.
    the hardest line to type correctly is: stty erase ^H

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://874068]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2014-09-19 21:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (147 votes), past polls