http://www.perlmonks.org?node_id=1142850


in reply to [Solved]: Query about regular expression

I would use a parser for that instead. Here's some code to get you started should you choose this path:

use strict; use warnings; use Data::Dumper; use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new( *DATA ); my $count = 0; my @array; while ( my $token = $p->get_token ) { next unless $token->is_start_tag('li'); next unless ++$count > 3; while ( $token = $p->get_token ) { last if $token->is_end_tag('li'); my $text = $token->as_is; $text =~ s/^\s*//; $text =~ s/\s*$//; push @array, $text unless $token->is_tag('br'); } last; } print Dumper \@array; __DATA__ <ul> <li><strong>site_user</strong></li> <ul> <li>user1</li> </ul> <li><strong>compare_hidden</strong></li> <ul> <li>average_speed_answer 25 60 30 60 ^M<br /> calls_waiting 300 500 300 500 ^M<br /> many more rows here post_ivr_calls_handled Wisconsin 50 100 50 100 ^M<br /> post_ivr_calls_handled Wyoming 50 100 50 100 ^M<br /> </li> <li><strong>calls_waiting_good_high</strong></li> <ul> <li>300</li> </ul> <li><strong>calls_waiting_warning_low</strong></li> <ul>

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

Replies are listed 'Best First'.
Re^2: Query about regular expression ( HTML::TreeBuilder::XPath)
by Anonymous Monk on Sep 24, 2015 at 00:08 UTC
    Too much work :) HTML::TreeBuilder::XPath is less work
    #!/usr/bin/perl -- use strict; use warnings; use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->ignore_unknown(0);; $tree->implicit_tags(0); $tree->no_expand_entities(1); $tree->ignore_unknown(0); $tree->ignore_ignorable_whitespace(0); $tree->no_space_compacting(1); $tree->store_comments(1); $tree->store_pis(1); $tree->parse(q{ <ul> <li><strong>site_user</strong></li> <ul> <li>user1</li> </ul> <li><strong>compare_hidden</strong></li> <ul> <li>average_speed_answer 25 60 30 60 ^M<br /> calls_waiting 300 500 300 500 ^M<br /> many more rows here post_ivr_calls_handled Wisconsin 50 100 50 100 ^M<br /> post_ivr_calls_handled Wyoming 50 100 50 100 ^M<br /> </li> <li><strong>calls_waiting_good_high</strong></li> <ul> <li>300</li> </ul> <li><strong>calls_waiting_warning_low</strong></li> <ul>}); $tree->eof; my @li = $tree->findnodes( q{ //li[ contains( ., 'average' ) ] } +) ; for my $ll ( @li ){ $ll->dump; print $ll->as_text, "\n";; } __END__ <li> @0.1.7.1 "average_speed_answer 25 60 30 60 ^M" <br /> @0.1.7.1.1 "\x0a calls_waiting 300 500 300 500 ^M" <br /> @0.1.7.1.3 "\x0a many more rows here\x0a post_ivr_calls_handled Wisconsin 50 10 +0 50..." <br /> @0.1.7.1.5 "\x0a post_ivr_calls_handled Wyoming 50 100 50 100 ^M" <br /> @0.1.7.1.7 "\x0a" average_speed_answer 25 60 30 60 ^M calls_waiting 300 500 300 500 ^M many more rows here post_ivr_calls_handled Wisconsin 50 100 50 100 ^M post_ivr_calls_handled Wyoming 50 100 50 100 ^M