Regex Question

madhatter has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regex Question by merlyn (Sage) on Jan 07, 2001 at 23:58 UTC
A little bit of lookahead/lookbehind should do it: `s/(?<!\n)\n(?!\n)//g;` [download] If that seems overly complex, then look for runs of newlines, and don't touch the ones that aren't isolated: `s/(\n+)/$1 eq "\n" ? "" : $1/eg;` [download] -- Randal L. Schwartz, Perl hacker	[reply] [d/l] [select]
Re: Re: Regex Question by salvadors (Pilgrim) on Jan 08, 2001 at 00:18 UTC
A bit of benchmarking shows that the first is about twice as fast as the second, and about three times faster than my verison... I really must get around to reading the regex book... it's been sitting on my desk for months now. Tony	[reply]
Re: Regex Question by chipmunk (Parson) on Jan 07, 2001 at 23:57 UTC
The easiest way to fix this substitution is to capture the extra characters: `s/([^\n])\n([^\n])/$1$2/g;` Although negative lookahead and/or lookbehind could be used instead: `s/(?<!\n)\n(?!\n)//g;` Lookahead and lookbehind assertions make sure that the subpattern matches (or doesn't match, for negative assertions), without using up those characters in the actual match. perlre explains it better. :)	[reply] [d/l] [select]
Re: Re: Regex Question by merlyn (Sage) on Jan 08, 2001 at 00:02 UTC
`s/([^\n])\n([^\n])/$1$2/g;` That fails on `fred\nX\nY\nbarney`, since the X will be sucked up while fixing the preceding newline, and won't be available to match for the following newline. Be very wary when matching right-side context. Passing it through to the "already seen" category means it won't be able to be left-side context for a later match. -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
Re (tilly) 3: Regex Question by tilly (Archbishop) on Jan 08, 2001 at 00:11 UTC
That is fixable though with a lookahead: `s/([^\n])\n(?!\n)/$1/g;` [download] Or the probably faster lookbehind: `s/(?<!\n)\n([^\n])/$1/g;` [download] UPDATE chipmunk noticed a typo. I had not closed the second match. Oops.	[reply] [d/l] [select]
Re: Regex Question by ryddler (Monk) on Jan 07, 2001 at 23:55 UTC
How about this? `$text = "This is not good,\n you know what I mean?\n\nThis should be o +n a separate line."; $text =~ s/[^\n](?:\n)\|(\n)\n[^\n]/$1/g; print $text;` [download] Which produces this: This is not good,you know what I mean? This should be on a separate line. ryddler	[reply] [d/l]
Re: Regex Question by salvadors (Pilgrim) on Jan 07, 2001 at 23:58 UTC
I've got text some text with line breaks. I want to strip all line breaks, unless there are two next to each other. What do you want to happen if there's more than 2? This will remove single breaks, but reduce any greater number to a single one: `$text =~ s/(\n)\n/$1/g;` [download] If you want to reduce multiples to 1 less (i.e. \n\n\n would become \n\n) then just put the inside the brackets: `$text =~ s/(\n*)\n/$1/g;` [download] Tony	[reply] [d/l] [select]
Re: Regex Question by I0 (Priest) on Jan 08, 2001 at 00:41 UTC
`$text =~ s/\n(\n?)(\n*)/$1$1$2/g`	[reply] [d/l]
Re: Re: Regex Question by salvadors (Pilgrim) on Jan 08, 2001 at 00:55 UTC
`s/\n(\n?)(\n)/$1$1$2/g`* Hmmm... so, I just had to work out why that was so much quicker than mine at doing pretty much the same thing... And then I noticed that rather matching this way around: `s/(\n)\n/$1/g;` [download] I should really check for the "one followed by none or more" rather than "none or more followed by one": `s/\n(\n)/$1/g;` [download] And, lo and behold, this runs over twice as fast as the one before, and much closer to the speed if I0's version (which also meets the original spec much better than mine! :)) Now I'm really determined to learn more about how to optimise regexs.. Tony	[reply] [d/l] [select]
Re: Regex Question by cat2014 (Monk) on Jan 08, 2001 at 00:03 UTC
i think that you want something like: `$text =~ s/([^\n])\n{1}([^\n])/$1$2/g;` [download] this should match exactly 1 newline which is surrounded by a non-newline character on either side. the {1} modifies the character before it & tells it that you have to match exactly 1. it's a pretty good thing to learn- basically, you do: m/x{1,}/ if you want to match at least 1 m/x{1,5}/ if you want to match at least 1 and most 5, etc. you should read the perlre page for more information. update: the above code, as merlyn pointed out, is wrong- you should use his example. but i still think that you should read the perlre page. (:	[reply] [d/l]
Re: Re: Regex Question by merlyn (Sage) on Jan 08, 2001 at 00:08 UTC
Nope. See my response at Re: Re: Regex Question. -- Randal L. Schwartz, Perl hacker	[reply]
Re: Regex Question by Coyote (Deacon) on Jan 08, 2001 at 00:45 UTC
Remember that you are substituting the entire matched pattern with nothing. Try this: `$text=~s/([^\n])\n([^\n])/$1$2/g;` [download] ---- Coyote (aka: Rich Anderson)	[reply] [d/l]
Re: Re: Regex Question by merlyn (Sage) on Jan 08, 2001 at 01:26 UTC
Nope... that's broken in the same way that these other two are! Amazing how easy it is to get wrong, eh? -- Randal L. Schwartz, Perl hacker	[reply]


Do you know where your variables are?
	PerlMonks