Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: RegEx - Positive Look-ahead

by 7stud (Deacon)
on Feb 05, 2013 at 20:28 UTC ( #1017277=note: print w/ replies, xml ) Need Help??


in reply to RegEx - Positive Look-ahead

Is this what you want???

use strict; use warnings; use 5.012; use Text::Balanced qw( extract_tagged extract_multiple ); my $text = <<'END_OF_STRING'; {{Infobox text text text {{text text text text {{text text}} text}} {{{text {{text }} text }}} }} blah blah blah blah blah blah {{Infobox text1 text1 text1 {{text1 text1 text1 text1 {{text1 text1}} text1}} {{{text1 {{text1 }} text1 }}} }} {{Infobox one}} END_OF_STRING my @infoboxes = extract_multiple( $text, [ \&my_extractor], undef, 1 ) +; sub my_extractor { extract_tagged( $text, "{{", "}}", ); } for my $infobox (@infoboxes) { say $infobox; say '*' x 20; } --output:-- {{Infobox text text text {{text text text text {{text text}} text}} {{{text {{text }} text }}} }} ******************** {{Infobox text1 text1 text1 {{text1 text1 text1 text1 {{text1 text1}} text1}} {{{text1 {{text1 }} text1 }}} }} ******************** {{Infobox one}} ********************

Here's the same result using regexes via Regexp::Common:

use strict; use warnings; use 5.012; use Regexp::Common qw( balanced ); my $text = <<'END_OF_STRING'; {{Infobox text text text {{text text text text {{text text}} text}} {{{text {{text }} text }}} }} blah blah blah blah blah blah {{Infobox text1 text1 text1 {{text1 text1 text1 text1 {{text1 text1}} text1}} {{{text1 {{text1 }} text1 }}} }} {{Infobox one}} END_OF_STRING my $pattern = $RE{ balanced } { -begin => '{{' } { -end => '}}' }; while ($text =~ /($pattern)/gxms) { say $1; say '*' x 20; } --output:-- {{Infobox text text text {{text text text text {{text text}} text}} {{{text {{text }} text }}} }} ******************** {{Infobox text1 text1 text1 {{text1 text1 text1 text1 {{text1 text1}} text1}} {{{text1 {{text1 }} text1 }}} }} ******************** {{Infobox one}} ********************


Comment on Re: RegEx - Positive Look-ahead
Select or Download Code
Replies are listed 'Best First'.
Re^2: RegEx - Positive Look-ahead
by tmharish (Friar) on Feb 06, 2013 at 07:35 UTC

    Works like a charm - Thank you very much.

    Also this was really helpful and this has been noted.

Re^2: RegEx - Positive Look-ahead
by tmharish (Friar) on Feb 07, 2013 at 14:26 UTC

    7stud

    Considering your other post ( which might or might not have stemmed from this ) I thought I would update this thread with the final solution that I used ( also for anyone else who might care ).

    I found that, considering {{Infobox was not the only chunk I needed, I was taking a huge performance hit. To avoid this I changed to a single sweep of the ( long ) text chunk as follows - I have removed the other parts that I extracted in the same sweep so as to stick to the OP topic.

    use strict ; use warnings ; use Data::Dump qw( dump ) ; my $text = <<'END_OF_STRING'; {{Infobox text text text {{text text text text {{text text}} text}} {{{text {{text }} text }}} END}} blah blah blah blah blah blah {{Infobox text1 text1 text1 {{text1 text1 text1 text1 {{text1 text1}} text1}} {{{text1 {{text1 }} text1 }}} }} {{Infobox one}} END_OF_STRING my $box_contents = _get_info_boxes( $text ) ; dump( $box_contents ) ; exit; sub _get_info_boxes { my $text = shift ; my @info_box_contents ; my $in_info_box ; my $this_info_box_content = "" ; my $bracket_count = 0 ; foreach my $line ( split( /\n/, $text ) ) { unless( $in_info_box ) { next unless( $line =~ /{{Infobox/ ) ; $in_info_box = 1 ; } $this_info_box_content .= $line . "\n" ; my $open_count = ( $line =~ tr/{// ) ; my $close_count = ( $line =~ tr/}// ) ; $bracket_count = $bracket_count + $open_count - $close_count ; if( $bracket_count == 0 ) { push @info_box_contents, $this_info_box_content ; $this_info_box_content = "" ; $in_info_box = 0 ; $bracket_count = 0 ; } } return \@info_box_contents ; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1017277]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2015-07-31 05:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (274 votes), past polls