http://www.perlmonks.org?node_id=820895

fusiondog has asked for the wisdom of the Perl Monks concerning the following question:

The situation: 60G HD, 40G file. No room to redirect standard out to a new file and/or make a backup. I thought, perl -i of course, but I get a counter point from a coworker that he believes it will still require extra disk space like sed -i. I've found an old article by merlyn from '98 that has an algorithm for "in place" that keeps a read pointer ahead of a write pointer in the same filehandle and then at end of the file, either truncate (and presumably in the perfected algorithm) concatenated to the end of the file as needed. And in my experiments adding a sleep to a perl -p -i -e, I see that the file size is set to 0 and no temp file is created in the working directory. Also in perldiag the following line strongly indicates that this is what is happening on on dos systems at least: Canít do inplace edit without backup (F) Youíre on a system such as MS-DOS that gets confused if you try reading from a deleted (but still opened) file. You have to say "-i.bak", or some such. --- Seems like I'm right, but I can't find any implementation details that explicitly confirm that. Can anybody confirm?
  • Comment on Inplace without backup uses no extra disk space?

Replies are listed 'Best First'.
Re: Inplace without backup uses no extra disk space?
by BrowserUk (Pope) on Feb 02, 2010 at 07:00 UTC

    Windows doesn't allow(*) you to delete an open file. So yes, it is required that you create a backup in order to use -i.

    However, editing a file in-place isn't that hard to program, even if you need to delete and/or add to the file--provided you only need to process the file serially. Ie. from the beginning to the end, record by record.

    The trick is to maintain an in-memory buffer sufficient to ensure that you don't try to overwrite a part of the file before you've read it. If the aggregate additions amount to more than can be held in memory, then using a spill file as the buffer is a little more complex.

    Do you have some estimate of the volume of edits/deletions/insertions required?

    That said, disks are cheap. The first UK ad google found offered 500GB for £32.

    Another approach would be to write the file to a CD/DVD; delete it from the disk; then read it from the optical drive and write to the disk.

    (*)It may be possible using obscure backup APIs and special privileges, but that should be considered a very last resort.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Isn't 40G a bit large for optical storage? :)

        Actually, I think he said 60GB. So that's just 13 DVDs or 95 CDs. Update:No. You're right 40GB. so 9 DVDs or 62 CDs!

        Of course, the new hard disk would probably be cheaper than the media.

        It could get a tad expensive if he has to do it frequently--but his data would be safe:)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Inplace without backup uses no extra disk space?
by ikegami (Pope) on Feb 02, 2010 at 16:01 UTC

    I presume that sed -i does the following:

    1. Rename the original file
    2. Create a new file with the same name as the original
    3. Delete the original file

    perl -i does the same thing. If you say that sed uses a temp file, then so does perl. The only difference is that Perl makes the original anonymous instead of adding some suffix.

    Windows doesn't support anonymous files, which is why you need to specify a suffix on Windows.