Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re: How do I remove blank lines from text files?

by c-era (Curate)
on Jul 10, 2000 at 18:38 UTC ( #21842=note: print w/replies, xml ) Need Help??

in reply to How do I remove blank lines from text files?

regex may not be the best answer. A simple loop will work better.
my @data; open (FILE,"+</path/file") || "Unable to open file"; flock (FILE,2) || die "Unable to lock file"; foreach(<FILE>){ push @data,$_ unless ($_ eq "\n"); } seek (FILE,0,0) || die "Unable to seek"; print FILE @data; truncate (FILE,tell(FILE)) || die "Unable to truncate file"; close FILE || die "Unable to close file";

Replies are listed 'Best First'.
RE: Re: How do I remove blank lines from text files?
by ZZamboni (Curate) on Jul 11, 2000 at 13:53 UTC
    I don't like this for the following reasons:
    • You are reading the whole file in memory TWICE! Once for the arguments to the foreach and another one when pushing everything into @data. In any case, you want to use while instead of foreach.
    • You are overwriting the file in place, without creating a backup. What if the machine crashes in the middle of the "print FILE @data"? You have lost your data.
    I personally like the technique of writing the results to a new temporary file, and then renaming the original to a backup name and the temporary file to the original. Something like this:

    open FILE, "/path/to/file" or die "$!\n"; open OUT, "/tmp/tmpfile.$$" or die "$!\n"; while(<FILE>) { next if /^\s*$/; print OUT, $_; } close FILE; close OUT; rename("/path/to/file", "/path/to/file.bak") or die "Error in rename: $!\n"; rename("/tmp/tmpfile.$$", "/path/to/file") or die "Error in rename: $!";
    Side note: using "/tmp/tmpfile.$$" as a temporary file name could have security implications if the program is running set-uid. For better ways of creating a temporary file name, see the FAQ How do I make a temporary file name?

    Also, a regex is not necessary if you are looking for strictly empty lines. But many times, a line is considered empty even if it contains white space, in this case using regular expressions is the best way to do it.


      ? ? ? ?

      How did you come up with those reasons.

      It is true that if you open a file for writing and the system crashes, the file is destroyed. But the file is opened in read/write. When the system crashes the file will be unchanged unless the file is closed properly. There may be other reasons to make a backup, you reason is not one of them.

      As the foreach vs while, on my solaris there is no difference in memory usage between the two. I ran the two programs with a 40MB file and the memory usage was the same.

      I can understand the file write vs read/write mistake, but next time could you please check you facts before you post.

        Wrt the file being opened read/write: if you are rewriting the whole file and it is a large file and the system crashes in the middle, I will argue that the file may be partly rewritten, depending on when the system buffers were last flushed. I don't think the file will remain unchanged until it is properly closed. But I'm willing to be convinced otherwise.

        Wrt memory usage: my mistake. The immediate problem I saw with using foreach instead of while is that foreach provides an array context, whereas while provides a scalar context. Therefore, when you use foreach, the <FILE> slurps the entire file at once and creates a list for the foreach to cycle through. Then you are pushing each element into @data, from where I assumed the data would be duplicated. However, reading the foreach documentation, I see the following:

        ...each element of the list is aliased to the loop variable in turn ... Note that the loop variable becomes a reference to the element itself, rather than a copy of the element.
        Therefore you will not be duplicating data. You are reading the whole file in memory only once, not twice as in my original message.

        However, I still think reading the whole file in memory is a bad idea, because you will eventually find a file that will not fit in memory. Unless there are other reasons for doing it, I will always prefer to process it one line at a time.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://21842]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2020-10-26 22:36 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (254 votes). Check out past polls.