wrkrbeee has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks, at best, I am a novice to Perl, who is seeking to remove large chunks of whitespace (i.e., anything other than a character) between lines of a text file. This site contains a thread dating back to 2000, which seems to address this issue. Some Monks suggest the use of substitution pattern matching, while others suggest WHILE loops. I am attempting to use a while loop in the code below (but remain completely open to other ideas). However, I have no need for user input(i.e., do not need <STDIN>), and am less than clueless for how to use "print if (!/^\s*$/)" to remove the whitespace and write/save the result to the file. Apologize for such a simple problem. Grateful for any ideas. Thank you!

#! /usr/bin/perl -w use strict; use warnings; use lib "c:/strawberry/perl/site/lib"; my $files_dir = 'F:\research\SEC filings 10K and 10Q\Data\Filing Docs\ +2009\Test Data\HTML Clean'; my $write_dir = 'F:\research\SEC filings 10K and 10Q\Data\Filing Docs\ +2009\Test Data\HTML Clean\Non Word Strip'; opendir (my $dir_handle, $files_dir); while (my $filename = readdir($dir_handle)) { next unless -f $files_dir.'/'.$filename; print "Procesing $filename\n"; open my $fh_in, '<', $files_dir.'/'.$filename or die "failed to open '$filename' for read"; open my $fh_out, '>', $write_dir.'/'.$filename or die "failed to open '$filename' for write"; my $count=0; while (my $line = <$fh_in>) { my $text = $line; chomp ($text); #Strip/remove whitespace between lines of text file; while (<STDIN>) { print if (!/^\s*$/); } print $fh_out "$text\n"; #Save stripped results; } ++$count; print "$count lines read from $filename\n;" }

Replies are listed 'Best First'.
Re: Remove whitespace between lines
by Laurent_R (Canon) on Feb 03, 2015 at 18:05 UTC
    Change your while loop:
    while (my $line = <$fh_in>) { my $text = $line; chomp ($text); #Strip/remove whitespace between lines of text file; while (<STDIN>) { print if (!/^\s*$/); } print $fh_out "$text\n"; #Save stripped results; }
    as follows:
    while (my $line = <$fh_in>) { print $fh_out $line if (!/^\s*$/); }
    Update: Sorry, I wrote that above message on my mobile device in the train commuting back to home, the train was arriving near my town, so I copied and pasted the code a bit to hastily. The above code should be:
    while (my $line = <$fh_in>) { print $fh_out $line if $line !~ /^\s*$/; }
    Update 2: I had not noticed when I wrote the first update above, but poj sent me a CB message proposing a very similar correction. Thanks to poj. Further update: I had also not seen that poj also posted the correction as an answer to your question, so that these updates end up not to be very useful...

    Je suis Charlie.
      Thank you for the simple solution. It gives me the error of "use of uninitialized value $_ in pattern match." Over my head for sure.
        Try
        while (my $line = <$fh_in>) { print $fh_out $line if ($line !~ /^\s*$/); }
        poj

      Here the pattern can be further simplified: print if /\S/;

Re: Remove whitespace between lines
by Your Mother (Bishop) on Feb 03, 2015 at 18:13 UTC

    This might do what you want. Itís meant to be invoked as a command line tool per file and it sends its results to STDOUT. Itís essentially a one-liner unrolled to a script. You can put in an -i flag to edit the file in place but this is risky. Donít do it unless you have backups and are going to check everything. Save it as, space-collapser.pl or whatever you likeĖ

    Update: I donít have a lot of experience with WIN on this frontÖ Iím, not sure this will work for you as is.

    #!/usr/bin/env perl -0777 -p # -0777, idiom for "slurp" mode. # Strip all trailing spaces including blank lines with spaces. s/[^\n\r\S]+(?=\r?\n)//g; # Reduces all triple or greater line spacing to double spaced lines. s/((?:\r?\n){2})(?:\r?\n)+/$1/g; # -p print at the end of each implicit loop.
    space-collapser.pl file dir-with-files/*
Re: Remove whitespace between lines
by Anonymous Monk on Feb 03, 2015 at 18:05 UTC
    You don't need to use #! /usr/bin/perl -w as well as use warnings; Here is my version. It looks as though you're on Windows so if you have further problems, try turning the slashes round.
    #! /usr/bin/perl use strict; use warnings; #use lib "c:/strawberry/perl/site/lib"; my $files_dir = 'data'; my $write_dir = 'data/processed'; opendir (my $dir_handle, $files_dir); while (my $filename = readdir($dir_handle)) { next unless -f $files_dir.'/'.$filename; print "Procesing $filename\n"; open my $fh_in, '<', $files_dir.'/'.$filename or die "failed to open '$filename' for read"; open my $fh_out, '>', $write_dir.'/'.$filename or die "failed to open '$filename' for write"; my $count=0; while (my $line = <$fh_in>) { # print to output file only non-whitespace lines print $fh_out $line unless $line =~ /^\s*\n$/; ++$count; } print "$count lines read from $filename\n"; }
    The STDIN in the middle was hanging up the entire program. There was also an unnecessary chomp and assignment of $line to variable $text. I have commented out the 'use lib' line as you shouldn't need that either. Try this and let me know if it works.
Re: Remove whitespace between lines
by sundialsvc4 (Abbot) on Feb 03, 2015 at 17:31 UTC

    My immediate thought is, “just use grep on the command line.”   You don’t need to write a custom program at all, if you don’t want to.   The regular-expression should identify the lines that you want to keep.

    An “awk one-liner” is also useful in cases like these.   Again, no programming per se.

    Within a program, there are many ways to do it ... some you’ve seen will “slurp” the entire file into memory, then manipulate it as a gigantic text-string.   However, I prefer to process line-at-a-time text files a line-at-a-time.

      Hmm, sorry, but your answer is off-topic. The OP is writing a Perl script for achieving a given result, and your answer is: "Don't do it! Use grep!" But the code presented by the OP is not just trying to grep one file. It is looking for files in a directory, modifying these files and writing output in other files in another directory. It seems a bit difficult to do all this with a simple grep or even a awk one-liner. Of course, you could wrap the grep or the awk one-liner in a bash or sh or ksh script, but then the "no programming argument" is gone, you might as well do it in Perl, and it is likely to be more efficient in Perl.

      But there is an even more important factor: a Perl script, if well designed, is portable across different platforms. Even though the script shown by the OP has an (admittedly strange) "#! /usr/bin/perl -w" shebang line, the rest of the script seems to strongly indicate the OP is running her or his script under MS Windows, probably with Strawberry Perl. I pray you: how are you going to run a grep or a awk one-liner script under Windows (unless you are using Cygwin, but the OP does not seem to be doing that)? (Well, awk may probably have been ported under Windows, I do not know, but by far most Windows users don't have it.)

      Je suis Charlie.

        FYI, FWIW, several versions of awk have been ported to MS Windows. Until a few years ago, I used a version that was packaged along with make and a few other Unix tools as part of code generation application we use with many of our projects. We used the make and the awk from that package in our build scripts because everyone in the SW department had the package installed. Now we use Perl instead of awk (but still use make).