http://www.perlmonks.org?node_id=1005374


in reply to unexpected behaviour of text::balanced

That was surprising. It appears that Text::Balanced alters some of the magic innards of the variable. I changed your code a bit:

sub eteb { my $data = shift; my $orig = $data; my @array = extract_multiple( $data, [ sub{extract_tagged($_[0], 'a', 'b', undef,)}, ], undef, 1 ); print "data='$data'\n"; display('eteb1', @array); @array = extract_multiple( $orig, + [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 ); display('eteb2', @array); }

And get the desired results. The funny thing is, I was expecting that $data would be empty after the call or something, but was surprised to see that the value looked unchanged. I then changed the second extract_multiple to:

@array = extract_multiple( $data."", + [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 );

and it worked as you expect. I haven't read the Text::Balanced docs to see if it's expected behaviour or not. But if it isn't, you may want to file a bug report on it.

Update: I remember a module (Devel::Peek) that lets you look at the magic goo inside of variables, so I changed your program to look at the $data variable before and after the call:

sub eteb { my $data = shift; my $orig = $data; Dump($data); my @array = extract_multiple( $data, [ sub{extract_tagged($_[0], 'a', 'b', undef,)}, ], undef, 1 ); Dump($data); $data = $data.""; Dump($data); print "data='$data'\n"; + display('eteb1', @array); @array = extract_multiple( $data, [ sub{extract_bracketed($_[0], '()')}, ], undef, 1 ); display('eteb2', @array); } <c> <p>And sure enough, some stuff inside changed:</p> <c> $ perl 1005372.pl et:a)(b eb:(a)(b) SV = PV(0x8458478) at 0x84fe160 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x83c47b8 "(a)(b)"\0 CUR = 6 LEN = 8 SV = PVMG(0x83ef3a8) at 0x84fe160 REFCNT = 7 FLAGS = (PADMY,SMG,POK,pPOK) IV = 0 NV = 0 PV = 0x83c47b8 "(a)(b)"\0 CUR = 6 LEN = 8 MAGIC = 0x83c2ec0 MG_VIRTUAL = &PL_vtbl_mglob MG_TYPE = PERL_MAGIC_regex_global(g) MG_LEN = 5 SV = PVMG(0x83ef3a8) at 0x84fe160 REFCNT = 7 FLAGS = (PADMY,SMG,POK,pPOK) IV = 0 NV = 0 PV = 0x83c47b8 "(a)(b)"\0 CUR = 6 LEN = 8 MAGIC = 0x83c2ec0 MG_VIRTUAL = &PL_vtbl_mglob MG_TYPE = PERL_MAGIC_regex_global(g) MG_LEN = -1 data='(a)(b)' eteb1:a)(b eteb2:(a)(b)

After seeing this, I reviewed the docs for Text::Balanced, and noticed this:

Note that in a list context, the contents of the original input text (the first argument) are not modified in any way. However, if the input text was passed in a variable, that variable's pos value is updated to point at the first character after the extracted text. That means that in a list context the various subroutines can be used much like regular expressions. For example:

In short, it's supposed to do that. That way it's ready to pull out the *next* bits of balanced text for you. Appending a null to the end of the string simply resets the string.

Sigh! Had I read the docs before playing with the code, I'd've saved myself a little time. Ah, well...

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: unexpected behaviour of text::balanced
by Anonymous Monk on Nov 24, 2012 at 14:39 UTC
    Thanks for the explanation.