http://www.perlmonks.org?node_id=120611

Amoe has asked for the wisdom of the Perl Monks concerning the following question:

I have a complicated data structure (AoH, to be precise) that is dumped and loaded, via eval and Data::Dumper Now the loaded file has to be updated as it is processed, removing hashrefs already processed. I know how to modify the structure, but the way I can think of of updating the file (by clobbering and overwriting the old file with the new file) seems rather inelegant. My question, oh berobed ones: In the spirit of Tim Toady, is there another way to do it? I don't have any code yet, as I felt that writing any might be redundant at the moment.

--
my one true love

Replies are listed 'Best First'.
Re: Updating files
by jeroenes (Priest) on Oct 22, 2001 at 22:56 UTC
    AS long as you stick to the text file, there is nothing else than overwriting that file. I suggest to flatten your structure by taking making new hash keys, consisting of a zero-padded number and the keys of the nested hash. You than use these newly generated keys to create a flat database (like BerkeleyDB). This database has a random acces mode, so you can update and add and delete at will.

    For the remainder of the code: The translation from AoH to flat access is rather simple. Just concatenate the numbers and keys.

    HTH,

    Jeroen
    "We are not alone"(FZ)

Re: Updating files
by perrin (Chancellor) on Oct 23, 2001 at 00:21 UTC
    Your approach is fine, as long as there will only be one process working on this data at a time. If that ever changes, the suggestion above about using dbm files is probably the way to go.

    Incidentally, Storable is faster and might be worth switching to if you don't need the file to be human readable.

Re: Updating files
by fokat (Deacon) on Oct 23, 2001 at 00:00 UTC
    I once had to do something similar to what you describe, but with a number of producers and consumers. Also, I did not want to have to deal with databases.

    My solution involved creating a directory for each entry that was accepted by the system by one of the producers. Each one used a scheme that generated distinct names (and detected collisions with other instances, as mkdir will fail when attempting to create an existing directory).

    The consumers would lock an entry by creating a lock directory within the main directory. In my scenario, creating a directory was an atomic operation in the underlying FS where that application was running, and is indeed atomic today in a lot of FS.

    After achieving a succesful lock on a directory, the consumer simply procesed the data in the various files within, unlink()ing them as it proceeded. When done, the parent directory and then the lock directory were deleted in that order to prevent a second consumer getting into the same request.

    This managed to run a few hundreds of producers and consumers during more than a year, without a single race condition. The overhead of this solution was very small.