Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Removing double carriage return

by dragooneye (Novice)
on Aug 20, 2011 at 00:35 UTC ( #921345=perlquestion: print w/ replies, xml ) Need Help??
dragooneye has asked for the wisdom of the Perl Monks concerning the following question:

Hi, is there an easy way to remove double carriage returns in a string with a regex replacement? I've tried using the following line without success:

perl -i.bak -p -e 's/\n\n/\n/m' inputtext

This kind of works in that it removes any blank lines. But if there were 3 or more carriage returns, I'd like to have extra spacing present in the end:

perl -i.bak -p -e 's/^\n$//' inputtext

Thanks

Comment on Removing double carriage return
Select or Download Code
Re: Removing double carriage return
by GrandFather (Cardinal) on Aug 20, 2011 at 01:35 UTC

    I can't see how that could even "kind of works". According to the perlrun description of the -p switch your code provided on the command line is wrapped in:

    while (<>) { ... # your program goes here } continue { print or die "-p destination: $!\n"; }

    which deals a line at a time so your regex will never match. For such a multi-line matching regex to make sense you need to slurp the whole file into a string and run the regex on that. The easiest way to do that is write a small script rather than try to shoehorn it into a "one liner". However that gets to be a whole lot more work because you then have to handle things a file at a time instead of using Perl's -i and @ARGV magic in a simple fashion. However, with just a little work we can still use the magic:

    use strict; use warnings; my @files = @ARGV; $^I = '.bak'; for my $file (@files) { local $/; @ARGV = $file; while (<>) { s/\n\n/\n/gs; print; } }
    True laziness is hard work
      Thanks GrandFather. The code you posted works great!

      My "kind of works" statement was referring to the last line of code in my original post. I probably should have left this out as it is confusing.

Re: Removing double carriage return
by pvaldes (Chaplain) on Aug 20, 2011 at 01:40 UTC
    match 3 or more carriage returns... maybe like this? =~ m/\n{3,}/ ... so expanding the script provided from GrandFather
    while (<>) { s/\n\n/\n/gs; print; }
    while (<>) { next if $_ =~ m/\n{1}/; if ($_ =~ m/\n{2}){s/\n{2}/\n/gs} elsif ($_ =~ m/\n{3,}){s/\n{3,}/\n\s/gs} }
    or something like this...
      A program that reads a file, line by line, will never encounter more than one carriage return

        This is the why GrandFather is using "local $/;" in his script which is referenced by pvaldes reply.

        Update: I didn't read the comment of pvaldes before he updated it, thanks Anonymous for clarifying this issue

      Thanks pvaldes!

      This is a great addition to GrandFather's code. Unfortunately I can't get it to work exactly as you wrote.

      Minor issues: missing closing slashes for m/\n{2} and m/\n{3,} per my Perl system.

      My Perl system also could not interpret the \s escape char.

      After fixing the minor issues, the resultant file turns out blank. However, this will jumpstart my attempts to get a working script. I want the following logic. If one carriage return, do nothing. If two carriage returns, substitute with one. If three or more carriage returns, substitute all with 2.

      I'll post it when I can get it working.

      My updated code that seems to work. Thanks again pvaldes for your help!

      pvaldes' code:
      while (<>) { next if $_ =~ m/\n{1}/; if ($_ =~ m/\n{2}){s/\n{2}/\n/gs} elsif ($_ =~ m/\n{3,}){s/\n{3,}/\n\s/gs} }
      Mine:
      while (<>) { if ($_ =~ m/\n{1}/) { } if ($_ =~ m/\n{2}/){ s/\n{2}/\n/gs; print; } elsif ($_ =~ m/\n{3,}/){ s/\n{3,}/\n/gs; print; } }
      Above does not work. Below now works (prob a crude way of doing it. Please reply if you have a more elegant way):
      while (<>) { if ($_ =~ m/\S\n{2}\S/){ s/(\S)\n{2}(\S)/$1\n$2/gs; print; } elsif ($_ =~ m/\n{3,}/){ s/\n{3,}/\n\n/gs; print; } }
        That's better than my code yep, like this you catch a last case that I was missing: a file without any \n. Your code is basically the same as this, but probably don't hurt if you add a last small else only to caught this cases and help in future reviews
        while (<>) { if ($_ =~ m/\S\n{2}\S/){ s/(\S)\n{2}(\S)/$1\n$2/gs; print; } elsif ($_ =~ m/\n{3,}/){ s/\n{3,}/\n\n/gs; print; } else {print;} # do nothing (when you have 0 or 1 \n) }
        From the command line, this replaces 3 or more newlines with 2 newlines or replaces exactly 2 newlines with 1 newline.

        perl -i.bak -0777 -pe 's/(\n{3,}|\n\n)/2 == length $1 ? "\n" : "\n\n"/eg' inputfile

        Notice that this looks for the longest match first (so that 2 newlines won't match more than 2, i.e. 3 or more).

        Update: That could be simplified to:

        perl -i.bak -0777 -pe 's/\n+/2 < length $& ? "\n\n" : "\n"/ge' inputfile

Re: Removing double carriage return
by Cristoforo (Deacon) on Aug 20, 2011 at 02:03 UTC
    Use the file slurp argument switch, '0'

    perl -i.bak -0pe 's/\n\n/\n/g' inputtext

    Update: I chose zero instead of octal 0777 thinking it would be clearer but GrandFather may have a good point, (a byte having possible values of 0 to 255 and the base 10 value of 0777 is 511). I guess I was thinking also that double zero, 00, is the switch for paragraph mode reading.

      actually:

      perl -i.bak -0777pe 's/\n\n/\n/g' inputtext

      may be sightly better as no byte value matches octal 777, although that's rather nit picking and I'd have not noticed if I hadn't have needed to look up perlrun to find out what -0 (the digit 0 btw) actually does. ;)

      True laziness is hard work
Re: Removing double carriage return
by parv (Priest) on Aug 20, 2011 at 14:04 UTC

    A harder way would be to keep track of the types of consecutive lines seen while processing line by line. Print array contents only when n non-blank (however that is defined) consecutive lines are seen.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://921345]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2014-10-01 08:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (390 votes), past polls