Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

unexpected behaviour of text::balanced

by Anonymous Monk
on Nov 24, 2012 at 13:23 UTC ( #1005372=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

The following prints et:a)(b eb:(a)(b) eteb1:a)(b eteb2: i expected it to print et:a)(b eb:(a)(b) eteb1:a)(b eteb2:(a)(b) were my expectations unexpected? use strict; use warnings; use File::Slurp; use Text::Balanced qw/extract_multiple extract_bracketed extract_tagge +d/; my $data = '(a)(b)'; et($data); eb($data); eteb($data); sub et { my $data = shift; my @array = extract_multiple( $data, [ sub{extract_tagged($_[0], 'a', 'b', undef,)}, ], undef, 1 ); display('et', @array) } sub eb { my $data = shift; my @array = extract_multiple( $data, [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 ); display('eb', @array) } sub eteb { my $data = shift; my @array = extract_multiple( $data, [ sub{extract_tagged($_[0], 'a', 'b', undef,)}, ], undef, 1 ); display('eteb1', @array); @array = extract_multiple( $data, [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 ); display('eteb2', @array); } sub display { my $sub = shift; print "$sub:"; print $_ for @_; print "\n"; }

Comment on unexpected behaviour of text::balanced
Download Code
Replies are listed 'Best First'.
Re: unexpected behaviour of text::balanced
by roboticus (Chancellor) on Nov 24, 2012 at 13:41 UTC

    That was surprising. It appears that Text::Balanced alters some of the magic innards of the variable. I changed your code a bit:

    sub eteb { my $data = shift; my $orig = $data; my @array = extract_multiple( $data, [ sub{extract_tagged($_[0], 'a', 'b', undef,)}, ], undef, 1 ); print "data='$data'\n"; display('eteb1', @array); @array = extract_multiple( $orig, + [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 ); display('eteb2', @array); }

    And get the desired results. The funny thing is, I was expecting that $data would be empty after the call or something, but was surprised to see that the value looked unchanged. I then changed the second extract_multiple to:

    @array = extract_multiple( $data."", + [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 );

    and it worked as you expect. I haven't read the Text::Balanced docs to see if it's expected behaviour or not. But if it isn't, you may want to file a bug report on it.

    Update: I remember a module (Devel::Peek) that lets you look at the magic goo inside of variables, so I changed your program to look at the $data variable before and after the call:

    sub eteb { my $data = shift; my $orig = $data; Dump($data); my @array = extract_multiple( $data, [ sub{extract_tagged($_[0], 'a', 'b', undef,)}, ], undef, 1 ); Dump($data); $data = $data.""; Dump($data); print "data='$data'\n"; + display('eteb1', @array); @array = extract_multiple( $data, [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 ); display('eteb2', @array); } <c> <p>And sure enough, some stuff inside changed:</p> <c> $ perl 1005372.pl et:a)(b eb:(a)(b) SV = PV(0x8458478) at 0x84fe160 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x83c47b8 "(a)(b)"\0 CUR = 6 LEN = 8 SV = PVMG(0x83ef3a8) at 0x84fe160 REFCNT = 7 FLAGS = (PADMY,SMG,POK,pPOK) IV = 0 NV = 0 PV = 0x83c47b8 "(a)(b)"\0 CUR = 6 LEN = 8 MAGIC = 0x83c2ec0 MG_VIRTUAL = &PL_vtbl_mglob MG_TYPE = PERL_MAGIC_regex_global(g) MG_LEN = 5 SV = PVMG(0x83ef3a8) at 0x84fe160 REFCNT = 7 FLAGS = (PADMY,SMG,POK,pPOK) IV = 0 NV = 0 PV = 0x83c47b8 "(a)(b)"\0 CUR = 6 LEN = 8 MAGIC = 0x83c2ec0 MG_VIRTUAL = &PL_vtbl_mglob MG_TYPE = PERL_MAGIC_regex_global(g) MG_LEN = -1 data='(a)(b)' eteb1:a)(b eteb2:(a)(b)

    After seeing this, I reviewed the docs for Text::Balanced, and noticed this:

    Note that in a list context, the contents of the original input text (the first argument) are not modified in any way. However, if the input text was passed in a variable, that variable's pos value is updated to point at the first character after the extracted text. That means that in a list context the various subroutines can be used much like regular expressions. For example:

    In short, it's supposed to do that. That way it's ready to pull out the *next* bits of balanced text for you. Appending a null to the end of the string simply resets the string.

    Sigh! Had I read the docs before playing with the code, I'd've saved myself a little time. Ah, well...

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Thanks for the explanation.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1005372]
Approved by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (8)
As of 2015-07-08 04:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (94 votes), past polls