Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: RegEx - Positive Look-ahead

by 7stud (Deacon)
on Feb 05, 2013 at 20:28 UTC ( #1017277=note: print w/ replies, xml ) Need Help??


in reply to RegEx - Positive Look-ahead

Is this what you want???

use strict; use warnings; use 5.012; use Text::Balanced qw( extract_tagged extract_multiple ); my $text = <<'END_OF_STRING'; {{Infobox text text text {{text text text text {{text text}} text}} {{{text {{text }} text }}} }} blah blah blah blah blah blah {{Infobox text1 text1 text1 {{text1 text1 text1 text1 {{text1 text1}} text1}} {{{text1 {{text1 }} text1 }}} }} {{Infobox one}} END_OF_STRING my @infoboxes = extract_multiple( $text, [ \&my_extractor], undef, 1 ) +; sub my_extractor { extract_tagged( $text, "{{", "}}", ); } for my $infobox (@infoboxes) { say $infobox; say '*' x 20; } --output:-- {{Infobox text text text {{text text text text {{text text}} text}} {{{text {{text }} text }}} }} ******************** {{Infobox text1 text1 text1 {{text1 text1 text1 text1 {{text1 text1}} text1}} {{{text1 {{text1 }} text1 }}} }} ******************** {{Infobox one}} ********************

Here's the same result using regexes via Regexp::Common:

use strict; use warnings; use 5.012; use Regexp::Common qw( balanced ); my $text = <<'END_OF_STRING'; {{Infobox text text text {{text text text text {{text text}} text}} {{{text {{text }} text }}} }} blah blah blah blah blah blah {{Infobox text1 text1 text1 {{text1 text1 text1 text1 {{text1 text1}} text1}} {{{text1 {{text1 }} text1 }}} }} {{Infobox one}} END_OF_STRING my $pattern = $RE{ balanced } { -begin => '{{' } { -end => '}}' }; while ($text =~ /($pattern)/gxms) { say $1; say '*' x 20; } --output:-- {{Infobox text text text {{text text text text {{text text}} text}} {{{text {{text }} text }}} }} ******************** {{Infobox text1 text1 text1 {{text1 text1 text1 text1 {{text1 text1}} text1}} {{{text1 {{text1 }} text1 }}} }} ******************** {{Infobox one}} ********************


Comment on Re: RegEx - Positive Look-ahead
Select or Download Code
Re^2: RegEx - Positive Look-ahead
by tmharish (Friar) on Feb 06, 2013 at 07:35 UTC

    Works like a charm - Thank you very much.

    Also this was really helpful and this has been noted.

Re^2: RegEx - Positive Look-ahead
by tmharish (Friar) on Feb 07, 2013 at 14:26 UTC

    7stud

    Considering your other post ( which might or might not have stemmed from this ) I thought I would update this thread with the final solution that I used ( also for anyone else who might care ).

    I found that, considering {{Infobox was not the only chunk I needed, I was taking a huge performance hit. To avoid this I changed to a single sweep of the ( long ) text chunk as follows - I have removed the other parts that I extracted in the same sweep so as to stick to the OP topic.

    use strict ; use warnings ; use Data::Dump qw( dump ) ; my $text = <<'END_OF_STRING'; {{Infobox text text text {{text text text text {{text text}} text}} {{{text {{text }} text }}} END}} blah blah blah blah blah blah {{Infobox text1 text1 text1 {{text1 text1 text1 text1 {{text1 text1}} text1}} {{{text1 {{text1 }} text1 }}} }} {{Infobox one}} END_OF_STRING my $box_contents = _get_info_boxes( $text ) ; dump( $box_contents ) ; exit; sub _get_info_boxes { my $text = shift ; my @info_box_contents ; my $in_info_box ; my $this_info_box_content = "" ; my $bracket_count = 0 ; foreach my $line ( split( /\n/, $text ) ) { unless( $in_info_box ) { next unless( $line =~ /{{Infobox/ ) ; $in_info_box = 1 ; } $this_info_box_content .= $line . "\n" ; my $open_count = ( $line =~ tr/{// ) ; my $close_count = ( $line =~ tr/}// ) ; $bracket_count = $bracket_count + $open_count - $close_count ; if( $bracket_count == 0 ) { push @info_box_contents, $this_info_box_content ; $this_info_box_content = "" ; $in_info_box = 0 ; $bracket_count = 0 ; } } return \@info_box_contents ; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1017277]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (15)
As of 2014-12-17 21:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (34 votes), past polls