Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Regex Question

by madhatter (Sexton)
on Jan 07, 2001 at 23:45 UTC ( #50366=perlquestion: print w/ replies, xml ) Need Help??
madhatter has asked for the wisdom of the Perl Monks concerning the following question:

I've got text some text with line breaks. I want to strip all line breaks, unless there are two next to each other.

Here is what I have:

Code: $text = "This is not good,\n you know what I mean?\n\nThis should be o +n a separate line."; $text =~ s/[^\n]\n[^\n]//g; print $text; Run: C:\Perl>perl test.pl This is not goodyou know what I mean? This should be on a separate line. Should Print: This is not good, you know what I mean? This should be on a separate line.
The (^\n) 's (brackets linked it, so I used parentheses there) are being substituted out, too, which is not wanted. How would I get around this?

Thanks,
madhatter

Comment on Regex Question
Download Code
Re: Regex Question
by ryddler (Monk) on Jan 07, 2001 at 23:55 UTC

    How about this?

    $text = "This is not good,\n you know what I mean?\n\nThis should be o +n a separate line."; $text =~ s/[^\n](?:\n)|(\n)\n[^\n]/$1/g; print $text;

    Which produces this:

    This is not good,you know what I mean?
    This should be on a separate line.


    ryddler
Re: Regex Question
by chipmunk (Parson) on Jan 07, 2001 at 23:57 UTC
    The easiest way to fix this substitution is to capture the extra characters: s/([^\n])\n([^\n])/$1$2/g;
    Although negative lookahead and/or lookbehind could be used instead: s/(?<!\n)\n(?!\n)//g;
    Lookahead and lookbehind assertions make sure that the subpattern matches (or doesn't match, for negative assertions), without using up those characters in the actual match. perlre explains it better. :)
      s/([^\n])\n([^\n])/$1$2/g;
      That fails on fred\nX\nY\nbarney, since the X will be sucked up while fixing the preceding newline, and won't be available to match for the following newline.

      Be very wary when matching right-side context. Passing it through to the "already seen" category means it won't be able to be left-side context for a later match.

      -- Randal L. Schwartz, Perl hacker

        That is fixable though with a lookahead:
        s/([^\n])\n(?!\n)/$1/g;
        Or the probably faster lookbehind:
        s/(?<!\n)\n([^\n])/$1/g;

        UPDATE
        chipmunk noticed a typo. I had not closed the second match. Oops.

Re: Regex Question
by merlyn (Sage) on Jan 07, 2001 at 23:58 UTC
    A little bit of lookahead/lookbehind should do it:
    s/(?<!\n)\n(?!\n)//g;
    If that seems overly complex, then look for runs of newlines, and don't touch the ones that aren't isolated:
    s/(\n+)/$1 eq "\n" ? "" : $1/eg;

    -- Randal L. Schwartz, Perl hacker

      A bit of benchmarking shows that the first is about twice as fast as the second, and about three times faster than my verison...

      I really must get around to reading the regex book... it's been sitting on my desk for months now.

      Tony

Re: Regex Question
by salvadors (Pilgrim) on Jan 07, 2001 at 23:58 UTC

    I've got text some text with line breaks. I want to strip all line breaks, unless there are two next to each other.

    What do you want to happen if there's more than 2? This will remove single breaks, but reduce any greater number to a single one:

    $text =~ s/(\n)*\n/$1/g;
    If you want to reduce multiples to 1 less (i.e. \n\n\n would become \n\n) then just put the * inside the brackets:
    $text =~ s/(\n*)\n/$1/g;

    Tony

Re: Regex Question
by cat2014 (Monk) on Jan 08, 2001 at 00:03 UTC
    i think that you want something like:

    $text =~ s/([^\n])\n{1}([^\n])/$1$2/g;

    this should match exactly 1 newline which is surrounded by a non-newline character on either side. the {1} modifies the character before it & tells it that you have to match exactly 1. it's a pretty good thing to learn- basically, you do:
    m/x{1,}/ if you want to match at least 1
    m/x{1,5}/ if you want to match at least 1 and most 5, etc.

    you should read the perlre page for more information.

    update:

    the above code, as merlyn pointed out, is wrong- you should use his example. but i still think that you should read the perlre page. (:

Re: Regex Question
by I0 (Priest) on Jan 08, 2001 at 00:41 UTC
    $text =~ s/\n(\n?)(\n*)/$1$1$2/g

      s/\n(\n?)(\n*)/$1$1$2/g

      Hmmm... so, I just had to work out why that was so much quicker than mine at doing pretty much the same thing...

      And then I noticed that rather matching this way around:

      s/(\n)*\n/$1/g;
      I should really check for the "one followed by none or more" rather than "none or more followed by one":
      s/\n(\n)*/$1/g;

      And, lo and behold, this runs over twice as fast as the one before, and much closer to the speed if I0's version (which also meets the original spec much better than mine! :))

      Now I'm *really* determined to learn more about how to optimise regexs..

      Tony

Re: Regex Question
by Coyote (Deacon) on Jan 08, 2001 at 00:45 UTC
    Remember that you are substituting the entire matched pattern with nothing. Try this:
    $text=~s/([^\n])\n([^\n])/$1$2/g;

    ---- Coyote (aka: Rich Anderson)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://50366]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2014-12-25 10:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (160 votes), past polls