http://www.perlmonks.org?node_id=854921

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, if I want to modify a record in a file and I don't know whether the record I'm looking for already exists can I do that in one pass without having to write out the whole file. e.g. This is the pseudo code
open file While (<file>) { if value exists then modify and print record to output file else print record to output file } rename output file back to original filename
If I do this and the value I'm searching for doesn't exist then I've re-written the entire contents of the original file needlessly. Hence is there anyway to redo the same loop so the first time round I see if the value exists and if it doesn't I don't bother creating an output file ? So a redo for the whole loop rather than the current iteration of the loop ?

Replies are listed 'Best First'.
Re: redo entire loop
by roboticus (Chancellor) on Aug 13, 2010 at 14:54 UTC

    Sure thing, assuming your data file fits in memory, slurp it into an array first. Then make any updates. If you made no updates, you can quit; otherwise write the array to the output file.

    ...roboticus

Re: redo entire loop
by JavaFan (Canon) on Aug 13, 2010 at 15:05 UTC
    I would consider three cases:
    • The file isn't huge. In that case, I wouldn't bother with the optimization.
    • The file isn't gigantic. In that case, I'd slurp in the file into a string, and attempt the modification. Exit if nothing changed.
    • If the file is gigantic (so it won't fit into memory), I'd use 'grep' to see if there's an occurrence. If not, no need to even start the program. Else, run the program.
Re: redo entire loop
by dasgar (Priest) on Aug 13, 2010 at 15:14 UTC

    You could try Tie::File and iterate through each line looking for your desired value/data and modify it directly. If I understood the documentation correctly, Tie::File does not necessarily store the entire file in memory and allows for the user to control it's memory usage setting when called.

      A Tie::File solution would look like the following, but it would be much more expensive than the following:
      my $found = 0; while (<$fh>) { if (match()) { $found = $.; last; } } if ($found) { seek($fh, $found, SEEK_SET); open(my $fh_tmp, '<', ...) or die; while (<$fh>) { transform(); print $fh_tmp $_; } replace_original_with_tmp(); }
Re: redo entire loop
by jonadab (Parson) on Aug 13, 2010 at 18:06 UTC

    From an algorithm analysis perspective, avoiding a second pass is a relatively minor optimization. Your algorithm is O(n) with the second pass, and it's still O(n) with the optimization. Unless the performance is so bad that users are timing it with a clock, they won't notice the difference.

    If the performance is so bad that users are timing it with a clock, you should probably think about O(log n) possibilities (such as running a binary search against a sorted index) or, depending on the nature of your data, maybe replacing the flat-file implementation with a DBMS.

Re: redo entire loop
by Anonymous Monk on Aug 13, 2010 at 15:02 UTC

    It doesn't sound like you want to redo the same thing as the first pass... It sounds like you want to do a second pass of the file while doing everything differently.

    I suggest extracting the common logic fragment (inspecting the line) into a subroutine.

    However, if your files aren't too big, you could simply modify the records in memory, then write it out only if you made changes.

Re: redo entire loop
by zek152 (Pilgrim) on Aug 13, 2010 at 15:42 UTC

    I believe I understand your question. (disclaimer: this in no way is an admission of liability if I do not, in fact, understand your question)

    What I think you should do (for simplicity's sake) is run 2 passes on the file, like so:

    #use whateveryouwant; my $filename = "insertfilenamehere"; open INPUT, $filename or die; my $modify_state = "false"; my $line = ""; while ($line = <INPUT>) { chomp($line); #look for the value if ($line =~ /are these the droids you're looking for?/) { $modify_state = "true"; last; } } close INPUT; if ($modify_state eq "false") { open INPUT, $filename; open OUTPUT, ">output" or die; while ($line = <INPUT>) { #modify the line here #and print to the output file } close OUTPUT; close INPUT; #rename the output file back to the original filename. }

    Notes about the code: It's not particularly efficient. You might have to go through the file twice (worst case). However to me the advantage is that it is simple. You look for the value. if you find it then stop looking and start over modifying the file. If you dont find it then you are already done. Hope this helps.

    Final note. This code has not been tested and it will take a little work to address your problem. It is intended as a framework for you to use.