Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Splitting string using two overlapping patterns

by kcott (Chancellor)
on Oct 25, 2013 at 05:50 UTC ( #1059584=note: print w/replies, xml ) Need Help??


in reply to Splitting string using two overlapping patterns

G'day kpr,

Welcome to the monastery.

"Is it possible to combine two patterns into normal split function or m/.../g pattern and capture the text from the middle?"

Yes, this is possible. There's also many ways to do it. I see a number of solutions have already been posted. Here's another one.

I've only included code that's been available since v5.8 or earlier. I've shown how to use the 'x' modifier to make your regex easier to read, easier to maintain and, generally, easier to deal with. Uncomment any of the "my $re = ..." lines to see that they all work the same (obviously, only uncomment one at a time). I've used test data that you posted as well as that provided by some of the monks who've already posted solutions.

Here's the code. Either "perlre - Perl regular expressions" or "perlvar: Variables related to regular expressions" should supply answers to any questions you have; if not, feel free to ask.

#!/usr/bin/env perl -l use strict; use warnings; my @lines = ( 'Iteration {Applied Field} {Total Energy} Mx', '{blaa blaa blaa}', 'Iteration {Applied Field} {Total Energy} Mx', 'Iteration {Applied Field} A {Total Energy} Mx F G {Third tes +t line}', 'Iteration {Applied Field} {Total Energy} Mx Fx {a} B C D {E F +} G', 'Iteration {Applied Field} {Total Energy} {Foo} { a b c d } + Mx', ); my $re_hard_to_read = qr/(?>[{]([^}]+)|\s*(?![{}])(\S+))/; my $re_easy_to_read = qr/(?> [{] ( [^}]+ ) | \s* (?! [{}] ) ( \S+ ) )/x; my $re_fully_annotated = qr/ (?> # start non-capturing, non-backtracking, alternati +on [{] # MATCH: exactly one literal left brace ( # start capture [^}]+ # CAPTURE: one or more of any character except rig +ht brace ) # end capture | # - OR - \s* # MATCH: zero or more whitespace (?! # start zero-width negative lookahead assertion [{}] # ASSERT: next character is not left or right brac +e ) # end zero-width negative lookahead assertion ( # start capture \S+ # CAPTURE: one or more non-whitespace characters ) # end capture ) # end non-capturing, non-backtracking, alternation /x; #my $re = $re_hard_to_read; #my $re = $re_easy_to_read; my $re = $re_fully_annotated; for (@lines) { # Print output heading print join "\n$_\n" => ('-' x 60) x 2; # Array to hold header names my @header_names; # Capture header names push @header_names => $+ while /$re/g; # Output header names print join "\n" => @header_names; }

The output starts like this:

------------------------------------------------------------ Iteration {Applied Field} {Total Energy} Mx ------------------------------------------------------------ Iteration Applied Field Total Energy Mx ------------------------------------------------------------ {blaa blaa blaa} ------------------------------------------------------------ blaa blaa blaa

Here's the rest:

-- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1059584]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2018-05-21 01:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?