Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Text::Balanced not extracting

by cormanaz (Chaplain)
on Oct 16, 2010 at 19:17 UTC ( #865713=perlquestion: print w/ replies, xml ) Need Help??
cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Howdy Bros. I am trying to use Text::Balanced to extract some balanced parens, brackts, etc. from a piece of text. In this example:
use Text::Balanced qw( extract_bracketed ); my $text = q/%%% rick ran errands . sem(1, [ word(1001,rick), + word(1002,ran), word(1003,errands), word(1004,'.') ], + [ pos(1001,'NN'), pos(1002,'VBD'), pos(1003,'NNS'), + pos(1004,'.') ] )./; my ($extracted,$remainder) = extract_bracketed($text,'()');
$extracted comes out null, and $remainder contains the entire string. I thought I would get the contents of the outermost set of parens. What am I doing wrong?

TIA

Steve

Comment on Text::Balanced not extracting
Download Code
Re: Text::Balanced not extracting
by kcott (Abbot) on Oct 16, 2010 at 20:58 UTC

    You need to specify a prefix.

    Change

    extract_bracketed($text,'()')

    to

    extract_bracketed($text,'()', '%%% rick ran errands . sem')

    and you'll get your expected output.

    This is explained in the Text::Balanced documentation.

    -- Ken

      Yeah I saw that but wasn't too sure what they meant by a prefix. I wonder why they built the module that way. Seems like there would be cases you'd want to use something like this were the "prefix" wouldn't be known in advance.

        I haven't used this module before today so I can't offer any deep insight into the ways things have been set up.

        I did wonder (exactly as you have) why it's set up like this.

        I've just successfully tried the following:

        my ($extracted,$remainder) = extract_bracketed($text,'()', ($text =~  m{ \A ( [^(]+ ) }msx)[0]);

        You might choose to use something like that in all cases that might have an unknown prefix.

        I haven't checked out the solution below (by Khen1950fx) - that may be preferable.

        -- Ken

Re: Text::Balanced not extracting
by Khen1950fx (Canon) on Oct 16, 2010 at 22:31 UTC
    Another way:
    #!/usr/bin/perl use strict; use warnings; use Text::Balanced qw(:ALL); use Data::Dumper::Concise; my $text = <DATA>; my ($extracted, $remainder) = extract_multiple( $text, [ \&extract_quotelike ]); print Dumper($extracted, $remainder); __DATA__ q/%%% rick ran errands . sem( [ word(1001, 'rick'), word(1002, 'ran'), word(1003, 'errands' +), word(1004, '.') ] , [ pos(1001, 'NN') , pos(1002, 'VBD') , pos(1003, 'NNS'), pos(1004, + '.') ] )/;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://865713]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2014-12-26 08:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (168 votes), past polls