Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Text::Balanced not extracting

by cormanaz (Chaplain)
on Oct 16, 2010 at 19:17 UTC ( #865713=perlquestion: print w/ replies, xml ) Need Help??
cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Howdy Bros. I am trying to use Text::Balanced to extract some balanced parens, brackts, etc. from a piece of text. In this example:
use Text::Balanced qw( extract_bracketed ); my $text = q/%%% rick ran errands . sem(1, [ word(1001,rick), + word(1002,ran), word(1003,errands), word(1004,'.') ], + [ pos(1001,'NN'), pos(1002,'VBD'), pos(1003,'NNS'), + pos(1004,'.') ] )./; my ($extracted,$remainder) = extract_bracketed($text,'()');
$extracted comes out null, and $remainder contains the entire string. I thought I would get the contents of the outermost set of parens. What am I doing wrong?

TIA

Steve

Comment on Text::Balanced not extracting
Download Code
Replies are listed 'Best First'.
Re: Text::Balanced not extracting
by kcott (Abbot) on Oct 16, 2010 at 20:58 UTC

    You need to specify a prefix.

    Change

    extract_bracketed($text,'()')

    to

    extract_bracketed($text,'()', '%%% rick ran errands . sem')

    and you'll get your expected output.

    This is explained in the Text::Balanced documentation.

    -- Ken

      Yeah I saw that but wasn't too sure what they meant by a prefix. I wonder why they built the module that way. Seems like there would be cases you'd want to use something like this were the "prefix" wouldn't be known in advance.

        I haven't used this module before today so I can't offer any deep insight into the ways things have been set up.

        I did wonder (exactly as you have) why it's set up like this.

        I've just successfully tried the following:

        my ($extracted,$remainder) = extract_bracketed($text,'()', ($text =~  m{ \A ( [^(]+ ) }msx)[0]);

        You might choose to use something like that in all cases that might have an unknown prefix.

        I haven't checked out the solution below (by Khen1950fx) - that may be preferable.

        -- Ken

Re: Text::Balanced not extracting
by Khen1950fx (Canon) on Oct 16, 2010 at 22:31 UTC
    Another way:
    #!/usr/bin/perl use strict; use warnings; use Text::Balanced qw(:ALL); use Data::Dumper::Concise; my $text = <DATA>; my ($extracted, $remainder) = extract_multiple( $text, [ \&extract_quotelike ]); print Dumper($extracted, $remainder); __DATA__ q/%%% rick ran errands . sem( [ word(1001, 'rick'), word(1002, 'ran'), word(1003, 'errands' +), word(1004, '.') ] , [ pos(1001, 'NN') , pos(1002, 'VBD') , pos(1003, 'NNS'), pos(1004, + '.') ] )/;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://865713]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2015-07-29 23:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (269 votes), past polls