regex replacement help

mkenney has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks! I've got a problem that I'm sure has a simple solution yet my terrible (read non-existant) grasp of regexes makes a real struggle. I need to split some text that contains a tilda up with a line return. Example:

Data In: 
AAA~BB~CCCCC~DDD~

Data Out:
AAA
BB
CCCCC
DDD
[download]

Which worked fine in the past with this HORRIBLE code I wrote long ago:

sub add_line_break
{
my (@out)=@_;
for (@out) {
     s/~+/\n/;
     s/~+/\n/;
     s/~+/\n/;
     s/~+/\n/;
        ...
 }
return wantarray ? @out :$out[0];
}
[download]

Now I need to do the same thing but leave the tilda on the end. The lines have varying numbers elements in them so I'd love it to do it for everyone of them without me having to duplicate the regex (which I'm sure I only had to do because of my poor grasp. Example:

Data In: 
AAA~BB~CCCCC~DDD~

Data Out:
AAA~
BB~
CCCCC~
DDD~
[download]

Can someone help me out. I'm hoping some real world examples will help me grasp regexs finally. I've struggled with them for YEARS. Tried different books and sites and I still can't close the loop for some reason. Thanks as always for your help!!! Mark

Comment on regex replacement help Select or Download Code

Replies are listed 'Best First'.
Re: regex replacement help by kennethk (Abbot) on Mar 29, 2011 at 18:43 UTC
The simplest modification to your code as it stands would be using capturing parentheses (see Extracting matches and Backreferences in perlretut) to grab your tildes for your substitutions. I will also add the `g` modifer (see Modifiers in perlre) so you can replace all groups of tildes in one pass. Your code might then look like: `sub add_line_break { my (@out)=@_; for (@out) { s/(~+)/$1\n/g; } return wantarray ? @out :$out[0]; }` [download] A logical change I would make (and a slight efficiency boost) would be to change that to a substitution applied to a location preceded by a tilde but not followed by a tilde: `s/(?<=~)(?!~)/\n/g;` That uses a positive look-behind and a negative look-ahead (see Looking ahead and looking behind in perlretut). As a side note, when you want to understand what a regular expression does, check out YAPE::Regex::Explain. It can be very useful when learning regular expressions and when dealing with unfamiliar/old code.	[reply] [d/l] [select]
Re^2: regex replacement help by Eliya (Vicar) on Mar 29, 2011 at 19:22 UTC
`s/(~+)/$1\n/g;` Another variant would be to use the (relatively new) `\K` ("Keep the stuff left of the \K"), which avoids having to capture/re-insert the matched fragment: `s/~+\K/\n/g;` [download]	[reply] [d/l] [select]
Re^2: regex replacement help by mkenney (Beadle) on Mar 29, 2011 at 19:09 UTC
Thanks for all the info! I'm going to pull up those articles tonight!!!	[reply]
Re: regex replacement help by lostjimmy (Chaplain) on Mar 29, 2011 at 18:41 UTC
You can remove the repeated substitutions by using the global modifier. If you want to have the tilda and a newline, just put that in your replacement: `s/(~+)/$1\n/g`	[reply] [d/l]
Re^2: regex replacement help by mkenney (Beadle) on Mar 29, 2011 at 19:08 UTC
That worked perfectly! Thanks!!!	[reply]
Re: regex replacement help by wind (Priest) on Mar 29, 2011 at 18:51 UTC
Using a zero width, negative look behind assertion will get you want you want. Read at perlre. `use Data::Dumper; use strict; my $str = 'AAA~BB~CCCCC~DDD~'; my @a = split /(?<=~)/, $str; print Dumper(\@a);` [download]	[reply] [d/l]
Re^2: regex replacement help by mkenney (Beadle) on Mar 29, 2011 at 19:11 UTC
Love how many ways you can skin a cat! Thanks for the help!!!	[reply]
Re: regex replacement help by jaimon (Sexton) on Mar 29, 2011 at 19:11 UTC
One easy way to handle your original problem (without the delim in the output) would be to use `@out = split /~/, $string` Now, since you want the tilde as well, you could simply append ~ to the end of each substring returned by split `@out = map {$_ .= "~"} split /~/, $string` Of course, there would be some edge cases to take care of; I didn't test this all that much. I'm a big fan of regex btw, but TIMTOWTDI UPDATE: Didn't look closely at solution offered by wind. That's neat! - J	[reply] [d/l] [select]
Re^2: regex replacement help by samarzone (Pilgrim) on Mar 30, 2011 at 07:35 UTC
Following would be a better approach to keep the splitting pattern as well. `@out = split /(~)/, $string` -- Regards - Samar	[reply] [d/l]
Re^3: regex replacement help by jaimon (Sexton) on Mar 30, 2011 at 17:06 UTC
Here, the splitting pattern would be a separate element - J	[reply]
Re^4: regex replacement help by samarzone (Pilgrim) on Mar 31, 2011 at 10:36 UTC
Re: regex replacement help by Cristoforo (Curate) on Mar 29, 2011 at 19:45 UTC
As of perl 5.10, you could use: `s/~\K/\n/g` (may be a trivial use of \K :-) )	[reply] [d/l]
Re: regex replacement help by furry_marmot (Pilgrim) on Mar 29, 2011 at 21:47 UTC
Instead of changing the tildes, use them as anchors to change the characters around them. `s/(\w)~(\w)/$1\n$2/g` [download] --marmot UPDATE: Changed second $1 to $2. Stupid fingers... UPDATE2: Just realized it was wrong, as well. What was I thinking??? This should do it. `s/~/\n/g; chomp;` [download]	[reply] [d/l] [select]
Re^2: regex replacement help by choroba (Cardinal) on Mar 31, 2011 at 11:06 UTC
$2	[reply]


laziness, impatience, and hubris
	PerlMonks