Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Unique filenames with Time::HiRes

by AcidHawk (Vicar)
on Jul 19, 2004 at 07:43 UTC ( [id://375475]=perlquestion: print w/replies, xml ) Need Help??

AcidHawk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, It is has been a wile since I was at the monastery. It's good to be back.

In an attempt to get unique names for files that need to be processed in sequence I turned to Time::Hires. I construct these files from a command line script. I then read the file names (using another script) from a dir and sort them according to the time (The filenames look something like XXX~1090220890.53125.xml . I am using gettimeofday() to get the unique number. To illustrate the issue I face I used the following test code.

#! /usr/bin/perl use strict; use warnings; use Time::HiRes ( gettimeofday ); for (my $i = 0; $i <= 10; $i++) { print "$i\tgettimeofday = " . gettimeofday() . "\n"; }
The output of which is..
0 gettimeofday = 1090220890.53125 1 gettimeofday = 1090220890.53125 2 gettimeofday = 1090220890.53125 3 gettimeofday = 1090220890.53125 4 gettimeofday = 1090220890.53125 5 gettimeofday = 1090220890.53125 6 gettimeofday = 1090220890.53125 7 gettimeofday = 1090220890.53125 8 gettimeofday = 1090220890.53125 9 gettimeofday = 1090220890.53125 10 gettimeofday = 1090220890.53125
NOT very unique...

The command line interface gets called extremely often and often several for these processes run at the same time. I would really not like to slow down the entire process if I can help it. The only way I can think of doing this is by keeping a number in a file which I lock until I have updated it with a new number. This means that all the other processes have to wait until the file is unlocked.

Anyone have any clever ideas of getting unique numbers fast (some kind of algorithm that includes the time etc...)

-----
Of all the things I've lost in my life, its my mind I miss the most.

Replies are listed 'Best First'.
Re: Unique filenames with Time::HiRes
by tachyon (Chancellor) on Jul 19, 2004 at 07:55 UTC
    There are to approaches that come to mind. Use File::CounterFile (the inc() method will return a unique value) or use File::Temp (or just the guts at Avoiding race condition with sysopen) and append the names to an order file (you still really need locking) to maintain your order, then just process the order file top to bottom.

    cheers

    tachyon

Re: Unique filenames with Time::HiRes (looks ok here)
by grinder (Bishop) on Jul 19, 2004 at 08:16 UTC
    ...NOT very unique...

    That's very odd. I just downloaded your code (which contains a couple of syntax errors. You need Time::Hires qw/gettimeofday/ (Missing the qw) and there's one too many opening parens in your for loop.

    0 gettimeofday = 1090223680.83084 1 gettimeofday = 1090223680.83102 2 gettimeofday = 1090223680.83107 3 gettimeofday = 1090223680.8311 4 gettimeofday = 1090223680.83116 5 gettimeofday = 1090223680.8312 6 gettimeofday = 1090223680.83123 7 gettimeofday = 1090223680.83128 8 gettimeofday = 1090223680.83131 9 gettimeofday = 1090223680.83135 10 gettimeofday = 1090223680.83139

    So there's something broken with the installation of your copy of Time::Hires. Can you reinstall it and make sure the test suite passes?

    Hmmmm.... Unless your machine is so rapid that it just happens to make those calls that quickly. That's within the realm of the possible. What happens if you extend your loop to open files and write to them?

    #! /usr/bin/perl use strict; use warnings; use Time::HiRes qw( gettimeofday ); for (my $i = 0; $i <= 10; $i++) { my $file = "tmp-" . gettimeofday(); open OUT, '>', $file or die "Cannot open $file for output: $!\n"; print OUT $i, "\n", gettimeofday(), "\n"; close OUT;

    When I run the above, I see a noticeable slowdown, and thus distance, in the names of the files:

    tmp-1090224255.75149 tmp-1090224255.75175 tmp-1090224255.75188 tmp-1090224255.752 tmp-1090224255.75212 tmp-1090224255.75224 tmp-1090224255.75237 tmp-1090224255.75249 tmp-1090224255.75261 tmp-1090224255.75273 tmp-1090224255.75285

    If all this fails, you shall have to use a database. Create a table with two columns, a serial number and a cookie. The serial number is managed by the database and gives you a monotonically increasing sequence. The cookie is just a large random string. Collect a number of elements, such as rand(), the time of day, your pid and so on. Hash it with MD5 and insert it into the cookie column.

    Commit the insert, and then go back and read off the serial number, that's your filename. You can periodically purge the table of all rows. This frees up space and will prevent an already negligeable possibility of duplicate keys from occurring.

    I still have a nagging suspicion, though, that when you have an arbitrary number of processes generating worksets that have to be processed downstream in sequence, that you cannot absolutely be certain that they will always be in sequence. It seems to me that there's a race issue involved.

    - another intruder with the mooring of the heat of the Perl

      Hi grinder,

      Thank you for your comments, I have tested Time::HiRes ( gettimeofday ); on both Win2k and Slackware and both seem to work fine. Thanks for the heads up on the opening parens.. thats what you get when you can't copy and paste from a vmware session..

      I have also tested this on two other machines and get similar results, where several of the gettimeofday()results have the same time.

      I am only creating one tmp file from the cli at a time. However several of these clis run simultaneously. which caused me to get files with the same name...

      I think I will have to look to a database.

      -----
      Of all the things I've lost in my life, its my mind I miss the most.
Re: Unique filenames with Time::HiRes
by DrHyde (Prior) on Jul 19, 2004 at 08:52 UTC
    As you've seen, using the time to generate unique filenames doesn't work. A common work-around is to concatenate the time and the process ID (you're only generating one per process, right?), the reasoning being that it's not possible to have two processes with the same ID at the same time.

    In theory, it might be possible to spawn processes fast enough and which die fast enough to cycle through all the available IDs in one second, so that's not really a very good solution either.

    I would recommend that you look at Data::UUID which claims to be an implementation of a standard method which is guaranteed to work, and which claims to support allocating very large numbers of IDs.

      I would recommend that you look at Data::UUID

      I'd second that. Works very well in my experience. Use the creation date of the files to order them.

      (I'm assuming that your comments about multiple processes mean that the exact order isn't important.)

        Use the creation date of the files to order them.
        Since he can create several files in a short interval, so that even Time::HiRes isn't updated quickly enough (update interval is 18 times a second, on Windows), and the resolution of creation date is in general only 1 or 2 seconds, I am certain you haven't solved the OP's problem.
      This is exactally what I need to do.

      I will create files that look like XXX~gettimeofday()~PID.xml and sort on the time of day. If there are two files with the same time of day I will have to build in some kind of exception handeling for the sequence thing.

      Thank you!

      -----
      Of all the things I've lost in my life, its my mind I miss the most.
        Cycling through all the PIDs within a second (or whatever is the smallest increment of time you can get) is not the problem. Hitting that cycle in one of those itervals is. And it will happen.

        So what you need to do for your exception handling is to first determine how high the process counter on your machine can count (this may possibly change with OS updates, so I'd try to get that information dynamically from the OS). Then decide that if there is a difference of for examle at least half that maximum count between PIDs in your filenames, the lower block of them came after the higher block. Process them accordingly

        That should do ya.

        Update: grinder has /msged me that this will not work on BSD type boxes where random PID allocation (a security feature) has been configured. He's probably right, so you better talk to your admin before relying on consecutive PID allocation for the next couple of years. :-)

Re: Unique filenames with Time::HiRes
by knoebi (Friar) on Jul 19, 2004 at 07:53 UTC
    you could just add an additional integer to your filename, so it will be unique. if you start with 0 and go up, you even have the real order.

    generate the filename, test if a file allready exists with this if not create one... you don't need locking for that.

    anyway, if you open more files than time::hires can show, probably you should change to a database.

    ciao
    knoebi

    UPDATE: you need locking, see beable's reply.

      you could just add an additional integer to your filename, so it will be unique. if you start with 0 and go up, you even have the real order.

      generate the filename, test if a file allready exists with this if not create one... you don't need locking for that.

      Actually, you would need locking for that, unless you have an atomic test-and-create-file function. Otherwise, you could have two processes which end up in a race condition:
      Process 1 checks for filename XXX-42, doesn't exist Process 2 checks for filename XXX-42, doesn't exist Process 1 creates filename XXX-42 and starts writing to it Process 2 opens filename XXX-42 and starts writing to it Process 1 finishes writing to XXX-42 Process 2 finishes writing to XXX-42
      Then what's in the file? Who knows? Probably not what you want.
        You are correct about this. You have to look that File XX-42, but the other process don't have to wait until the process which has opened XX-42 finishs. This was part of his question.

      The filenames are created using a command line script that accepts parameters and builds the file. I need to keep the counter number somewhere else and not in the script itself.

      If I dont lock the 'counter number holder file thingy' and two occurances of the cli access the 'counter number holder file thingy', each occurance of the script will get the same number.. hence the requirement to lock the 'counter number holder file thingy'.

      I do agree though that I might need to look at a database.

      -----
      Of all the things I've lost in my life, its my mind I miss the most.
        Maybe just append a random number, lock it successfully, or try another.
Re: Unique filenames with Time::HiRes
by mhi (Friar) on Jul 19, 2004 at 09:46 UTC
    You could set up another process as a filename-server which independently keeps count and spits out a new filename whenever asked. This filename could then consist of the datestamp plus a counter for any names with identical datestamps as previously mentioned.

    Pros: Fast. No locking required.
    Cons: You will probably have to write additional code to restart the server if it doesn't answer within a certain amount of time etc.

Re: Unique filenames with Time::HiRes
by naChoZ (Curate) on Jul 19, 2004 at 12:29 UTC

    You might use Time::HiRes in conjunction with the randomness offered by File::Temp. Since you can pass File::Temp a template to use for the filename, you could pass it something like $template = "yourfile-" . gettimeofday() . "XXXXX.xml"; so you'd still get what you want without interfering with filename sort order and it would also be a unique name.

    If the file you're generating isn't necessarily a temp file, you could still use File::Temp to do nothing more than generate you a file name. (undef, $filename) = tempfile($template, OPEN => 0);

    --
    People who want to share their religious views with you almost never want you to share yours with them.
    naChoZ

Re: Unique filenames with Time::HiRes
by BrowserUk (Patriarch) on Jul 19, 2004 at 17:47 UTC

    Update: Time::HiRes does use the high performance counters internally, in the lastest CPAN version anyway. I'm not sure why the version I have produces such low resolution output?

    The tick counter that Time::HiRes uses on Win32 isn't that hi res as you've seen. 64/second under NT4/5 and lower on earlier systems.

    There are however much higher resolution timers available.

    On my system these have a resolution of 1/3,579,545 times a seconds which is much faster than you can query them, even in C. On a single processor machine this should be sufficient to ensure uniqueness. On a multi-processor machine there is a small chance that two threads or processes could get exactly the same number, but combining that with the process/thread ID should suffice.

    Update: If you have Time::HiRes v1.59, this code should be redundant!

    Some code to show how to get at and convert the high resolution timer:

    #! perl -slw use strict; use Win32::API::Prototype; ApiLink( 'kernel32', q[ BOOL QueryPerformanceCounter( LARGE_INTEGER *lpPerformanceCount ) ]) or die $^E; ApiLink( 'kernel32', q[ BOOL QueryPerformanceFrequency( LARGE_INTEGER *lpPerformanceCount +) ]) or die $^E; sub int64_to_NV { my( $lo, $hi ) = unpack 'VV', $_[ 0 ]; return $hi * 2**32 + $lo; } my $frequency = ' ' x 10; QueryPerformanceFrequency( $frequency ) or die $^E; print 'Counter changes ', int64_to_NV( $frequency ), ' times/second'; for ( 1 .. 10 ) { QueryPerformanceCounter( $frequency )or die $^E; print int64_to_NV( $frequency ); } __END__ P:\test>375475 Counter changes 3579545 times/second 699092440218 699092440527 699092440794 699092441059 699092441312 699092441566 699092441820 699092442073 699092442322 699092442576

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
Re: Unique filenames with Time::HiRes
by mutated (Monk) on Jul 19, 2004 at 13:58 UTC
    Another solution, and it is a bit bulkier than the timehighres-PID idea (which is a good one) is just to have a process that sits on a socket and hands out incremental numbers. Then you can launch as many of your file generating scripts in parallel as you want, they will each always be handed a unique number by your script.


    daN.
      Hi,

      I origionally started this way, however when the daemon starts again it restarted the counter, if any previous files still existed, say at num 1001, and I restarted the daemon again, I would start creating files at number 1. This causes the files to be processed in the incorrect order.

      -----
      Of all the things I've lost in my life, its my mind I miss the most.
        Why not do a gettimeofday(), test to see if the file name exists, if it does then loop for a few ms, then rinse and repeat until the gettimeofday is unique?


        -Waswas
        It's relatively easy though when the daemon is starting for it to check and make sure it is starting above the last number that exists, it can take it's time because until it starts responding to requests for numbers the programs calling it should just block or whatever..


        daN.
Re: Unique filenames with Time::HiRes
by beable (Friar) on Jul 19, 2004 at 07:59 UTC
    You could try the File::Temp module. Or if you really want to generate your own filenames:
    #!/usr/bin/perl use strict; use warnings; for (my $i = 1; $i < 10; $i++) { my $filename = time . rand; print "$filename\n"; open FILE, ">$filename"; print FILE "BOOGA BOOGA $i\n"; close FILE; } # If that's not unique enough, try this: for (my $i = 1; $i < 10; $i++) { my $filename = time() . rand() . rand(); print "$filename\n"; open FILE, ">$filename"; print FILE "BOOGA BOOGA 2 x $i\n"; close FILE; } __END__
    This works as long as you read the files in directory order, and don't sort them by name. The first file created is the first in directory order, and so on for all the files. So it actually doesn't matter that the names aren't in ASCII order, as long as you don't sort them.

    Update: here's some code to test that the files do come out in the same order as they were created:

    #!/usr/bin/perl use strict; use warnings; opendir DIR, "."; # files starting with 10 should be okay for a while my @files = grep{/^10/} readdir(DIR); closedir DIR; for my $file(@files) { open FILE, "<$file" or die "can't open $file: $!"; local $/ = undef; my $data = <FILE>; close FILE; chomp $data; print "$data\n"; } __END__

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://375475]
Approved by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-19 23:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found