Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Removing empty line(s) with regex in a multiple strings in a variable

by monkfan (Curate)
on Sep 22, 2006 at 02:31 UTC ( [id://574298]=perlquestion: print w/replies, xml ) Need Help??

monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I have this string:
my $str = ' AAGTAATCAAGTATTACAAGAAACAAAAATTCAAGTAAATAACAGATAAT ATGTCAAAAGCTGTCGGTATTGATTTAGGTACAACATACTCGTGTGTTGC >YDR256C CTA1 969666 970216 GGGAAGAACTAAGAGATGTTATGGCTCGGAGAGTTTTGAAAAGCGAAATA GATTCGCTGCAAGTTTGTGAAGAAACCATCGACAAGAATTACAAGGTTAT '; # and it can be more than one empty lines
And I want to remove the empty line in between so that it gives this:
$VAR = ' AAGTAATCAAGTATTACAAGAAACAAAAATTCAAGTAAATAACAGATAAT ATGTCAAAAGCTGTCGGTATTGATTTAGGTACAACATACTCGTGTGTTGC >YDR256C CTA1 969666 970216 GGGAAGAACTAAGAGATGTTATGGCTCGGAGAGTTTTGAAAAGCGAAATA GATTCGCTGCAAGTTTGTGAAGAAACCATCGACAAGAATTACAAGGTTAT ';
Why my regex below doesn't work? What is the right solution?
use Data::Dumper; $str =~ s/[\s]+//mgx; print Dumper $str;

Regards,
Edward

Replies are listed 'Best First'.
Re: Removing empty line(s) with regex in a multiple strings in a variable
by graff (Chancellor) on Sep 22, 2006 at 03:13 UTC
    Your regex  s/[\s]+//mgx deletes all whitespace, and you don't want that -- you want to preserve the spaces that separate the four strings on the line that starts with ">".

    (I could also point out that in your regex, the square brackets and "m" and "x" modifiers could all be removed and it would do the same thing it does now -- including them has no effect at all in this case. Of course, this also means they do no harm, but if you don't understand why they do nothing in this case, it wouldn't hurt you to try reading the perlre manual page.)

    As GrandFather points out, you only want to eliminate the extra "\n" characters. The specific output that you say you want (with the initial "\n" still intact) would come out with just this:

    s/\n\s*/\n/g; # every string of \n plus 0-or-more whitespace --> sing +le \n
    (updated to emulate GrandFather's handling of "blank" lines that contain just spaces and/or tabs)
      I worked with the same trouble, and the only one good working way I found, it is $buffer =~ s/^\s*\n+//mg; Thank you!
Re: Removing empty line(s) with regex in a multiple strings in a variable
by GrandFather (Saint) on Sep 22, 2006 at 02:43 UTC

    Delete empty lines where empty includes white space only lines:

    $str =~ s/(^|\n)[\n\s]*/$1/g; print $str;

    Prints:

    AAGTAATCAAGTATTACAAGAAACAAAAATTCAAGTAAATAACAGATAAT ATGTCAAAAGCTGTCGGTATTGATTTAGGTACAACATACTCGTGTGTTGC >YDR256C CTA1 969666 970216 GGGAAGAACTAAGAGATGTTATGGCTCGGAGAGTTTTGAAAAGCGAAATA GATTCGCTGCAAGTTTGTGAAGAAACCATCGACAAGAATTACAAGGTTAT

    DWIM is Perl's answer to Gödel
      This expression has been very useful but I am new to Perl and regex and I would like to fully understand the process of how this works. Would it be possible to ask if it can be broken down step by step. I am afraid I just find the capture groups and newline characters somewhat confusing.

        A good place to start is with the perlre documentation. It is fairly extensive, but is likely to be much more useful to you in the long run than me decomposing the expression I gave above. Note that there is a very useful "See Also" section at the bottom of the documentation - in fact you might want to check that out first. There is a trick that may help you break the regex down into easier to understand parts though:

        $str =~ s/ (^|\n) [\n\s]* /$1/gx;

        The x switch ignores white space (including new lines) so we can use white space to break up the parts of the regex into units.

        Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
        $str =~ s/(^|\n)[\n\s]*/$1/g;
        Would it be possible to ask if it can be broken down step by step.

        Here's one way to look at this regex: The replacement part is $1, so that means that whatever is matched by the first capture group (^|\n) is kept, while everything else ([\n\s]*) is removed. (^|\n) matches either: (1) ^ means to match at the beginning of the string, so any newlines or whitespace [\n\s]* at the beginning of the string are removed, or (2) \n, which means that any newlines or whitespace [\n\s]* after that newline character are removed, but the first newline character is kept. This is how empty lines, which are usually just a sequence of two newline characters \n\n, are changed into a single newline character, meaning the empty line(s) are removed.

        If you're unsure of any of these things, then I can recommend perlretut.

Re: Removing empty line(s) with regex in a multiple strings in a variable
by jwkrahn (Abbot) on Sep 22, 2006 at 03:04 UTC
    It looks like you want:
    $str =~ tr/\n//s;
      Does not work for me. But tr/\n//d or s/\n//g work.
Re: Removing empty line(s) with regex in a multiple strings in a variable
by ysth (Canon) on Sep 22, 2006 at 02:47 UTC
    Works for me, sort of. At least it gets rid of all the newlines leaving
    'AAGTAATCAAGTATTACAAGAAACAAAAATTCAAGTAAATAACAGATAATATGTCAAAAGCTGTCGGTA +TTGATTTAGGTACAACATACTCGTGTGTTGC>YDR256CCTA 1969666970216GGGAAGAACTAAGAGATGTTATGGCTCGGAGAGTTTTGAAAAGCGAAATAGATTCGC +TGCAAGTTTGTGAAGAAACCATCGACAAGAATTACAAGGTTAT';
    What are you seeing? If you just want to compress multiple newlines, you could use $str =~ y/\n//s; If there are space characters on the "blank" line, you'd want something like $str =~ s/^\s+//mg; (or $str =~ s/^\s+\n//mg; if there may be spaces at the start of a non-blank line).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://574298]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-23 16:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found