in reply to Text::Balanced woes..
Everybody expects the extract_* subroutines to act like:
even though they're clearly documented (and fully intended) to act like:
$text =~ /extractor/ # i.e. match anywhere in the string
If you want to match heterogeneous input, what you really want is the extract_multiple subroutine. Like this:
$text =~ /\G extractor/gc # i.e. match at current pos in string
use Text::Balanced ':ALL';
my $text = "this is a test <B>for</B> tags! \n";
my @data = extract_multiple( $text, [ \&extract_tagged ]);
use Data::Dumper 'Dumper';
print Dumper [ @data ];
Your original example had <B>for</b>, which extract_tagged's case-sensitive default tag pattern wouldn't recognize anyway.
The other monks are correct, you'd be much better off with one of the many HTML:: modules on the CPAN (
HTML::TreeBuilder is my personal favorite).