Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

regex replacement help

by mkenney (Beadle)
on Mar 29, 2011 at 18:32 UTC ( [id://896245]=perlquestion: print w/replies, xml ) Need Help??

mkenney has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks! I've got a problem that I'm sure has a simple solution yet my terrible (read non-existant) grasp of regexes makes a real struggle. I need to split some text that contains a tilda up with a line return. Example:
Data In: AAA~BB~CCCCC~DDD~ Data Out: AAA BB CCCCC DDD
Which worked fine in the past with this HORRIBLE code I wrote long ago:
sub add_line_break { my (@out)=@_; for (@out) { s/~+/\n/; s/~+/\n/; s/~+/\n/; s/~+/\n/; ... } return wantarray ? @out :$out[0]; }
Now I need to do the same thing but leave the tilda on the end. The lines have varying numbers elements in them so I'd love it to do it for everyone of them without me having to duplicate the regex (which I'm sure I only had to do because of my poor grasp. Example:
Data In: AAA~BB~CCCCC~DDD~ Data Out: AAA~ BB~ CCCCC~ DDD~
Can someone help me out. I'm hoping some real world examples will help me grasp regexs finally. I've struggled with them for YEARS. Tried different books and sites and I still can't close the loop for some reason. Thanks as always for your help!!! Mark

Replies are listed 'Best First'.
Re: regex replacement help
by kennethk (Abbot) on Mar 29, 2011 at 18:43 UTC
    The simplest modification to your code as it stands would be using capturing parentheses (see Extracting matches and Backreferences in perlretut) to grab your tildes for your substitutions. I will also add the g modifer (see Modifiers in perlre) so you can replace all groups of tildes in one pass. Your code might then look like:

    sub add_line_break { my (@out)=@_; for (@out) { s/(~+)/$1\n/g; } return wantarray ? @out :$out[0]; }

    A logical change I would make (and a slight efficiency boost) would be to change that to a substitution applied to a location preceded by a tilde but not followed by a tilde:

    s/(?<=~)(?!~)/\n/g;

    That uses a positive look-behind and a negative look-ahead (see Looking ahead and looking behind in perlretut).

    As a side note, when you want to understand what a regular expression does, check out YAPE::Regex::Explain. It can be very useful when learning regular expressions and when dealing with unfamiliar/old code.

          s/(~+)/$1\n/g;

      Another variant would be to use the (relatively new) \K ("Keep the stuff left of the \K"), which avoids having to capture/re-insert the matched fragment:

      s/~+\K/\n/g;
      Thanks for all the info! I'm going to pull up those articles tonight!!!
Re: regex replacement help
by lostjimmy (Chaplain) on Mar 29, 2011 at 18:41 UTC
    You can remove the repeated substitutions by using the global modifier. If you want to have the tilda and a newline, just put that in your replacement: s/(~+)/$1\n/g
      That worked perfectly! Thanks!!!
Re: regex replacement help
by wind (Priest) on Mar 29, 2011 at 18:51 UTC
    Using a zero width, negative look behind assertion will get you want you want. Read at perlre.
    use Data::Dumper; use strict; my $str = 'AAA~BB~CCCCC~DDD~'; my @a = split /(?<=~)/, $str; print Dumper(\@a);
      Love how many ways you can skin a cat! Thanks for the help!!!
Re: regex replacement help
by jaimon (Sexton) on Mar 29, 2011 at 19:11 UTC

    One easy way to handle your original problem (without the delim in the output) would be to use

     @out = split /~/, $string

    Now, since you want the tilde as well, you could simply append ~ to the end of each substring returned by split

     @out = map {$_ .= "~"} split /~/, $string

    Of course, there would be some edge cases to take care of; I didn't test this all that much. I'm a big fan of regex btw, but TIMTOWTDI

    UPDATE: Didn't look closely at solution offered by wind. That's neat!
    - J

      Following would be a better approach to keep the splitting pattern as well.

      @out = split /(~)/, $string
      --
      Regards
      - Samar

        Here, the splitting pattern would be a separate element

        - J
Re: regex replacement help
by Cristoforo (Curate) on Mar 29, 2011 at 19:45 UTC
    As of perl 5.10, you could use: s/~\K/\n/g
    (may be a trivial use of \K :-) )
Re: regex replacement help
by furry_marmot (Pilgrim) on Mar 29, 2011 at 21:47 UTC
    Instead of changing the tildes, use them as anchors to change the characters around them.
    s/(\w)~(\w)/$1\n$2/g
    --marmot

    UPDATE: Changed second $1 to $2. Stupid fingers...

    UPDATE2: Just realized it was wrong, as well. What was I thinking??? This should do it.

    s/~/\n/g; chomp;
      $2

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://896245]
Approved by kennethk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-23 06:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found