Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Matching text between tags

by Anonymous Monk
on Apr 14, 2016 at 15:13 UTC ( [id://1160410]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I am trying to match the values inside of these  "<u>...</u>" and of course not doing the right thing. Any help?
#!/usr/bin/perl use strict; use warnings; my $lines = "Aliquam vitae ipsum id felis finibus congue. Ut molestie + scelerisque purus, sit amet rhoncus leo aliquet ac. In eu lobortis quam. Maecenas auctor + semper enim, ut convallis sapien dictum eu. Sed arcu ex, ornare et porttitor vitae +, interdum a mi. Mauris rutrum luctus rhoncus. Quisque velit quam, convallis vel est a +t, tincidunt accumsan velit. Fusce ut <u>metus ut which may either exceed $1,000.00 or OK. G. LAT, semper nunc, in dictum magna.</u> Aliquam ac vestibulum dolor. Praesent in magna nisi. Cras nec viverra + ligula. Suspendisse efficitur imperdiet eros, <u>sed rhoncus sapien euismod cursus. Vesti +bulum a posuere</u> elit, eget tristique eros. Etiam et lectus venenatis, aliquet dui vitae, po +suere lectus."; if( $lines =~ /<u>(.*?)<\/u>/ig ){ print "\n $1\n"; }

Thanks for looking!

Replies are listed 'Best First'.
Re: Matching text between tags
by hippo (Bishop) on Apr 14, 2016 at 16:12 UTC

    Your initial string suffers from interpolation. Changing to single quotes, using the /s modifier and testing gives us the desired results:

    #!/usr/bin/perl use strict; use warnings; use Test::More; my $lines = 'Aliquam vitae ipsum id felis finibus congue. Ut molestie +scelerisque purus, sit amet rhoncus leo aliquet ac. In eu lobortis quam. Maecenas auctor + semper enim, ut convallis sapien dictum eu. Sed arcu ex, ornare et porttitor vitae +, interdum a mi. Mauris rutrum luctus rhoncus. Quisque velit quam, convallis vel est a +t, tincidunt accumsan velit. Fusce ut <u>metus ut which may either exceed $1,000.00 or OK. G. LAT, semper nunc, in dictum magna.</u> Aliquam ac vestibulum dolor. Praesent in magna nisi. Cras nec viverra + ligula. Suspendisse efficitur imperdiet eros, <u>sed rhoncus sapien euismod cursus. Vesti +bulum a posuere</u> elit, eget tristique eros. Etiam et lectus venenatis, aliquet dui vitae, po +suere lectus.'; my ($first, $second) = ($lines =~ /<u>(.*?)<\/u>/sg); is ($first, 'metus ut which may either exceed $1,000.00 or OK. G. LAT, semper nunc, in dictum magna.', 'First match'); is ($second, 'sed rhoncus sapien euismod cursus. Vestibulum a posuere' +, 'Second match'); done_testing ();
    $ perl 1160410.pl ok 1 - First match ok 2 - Second match 1..2

    If that doesn't fix your problem you'll need to be a lot more specific about how it fails for you.

      Why this code can't get the second match:
      #!/usr/bin/perl use strict; use warnings; my @lines = ("Aliquam vitae ipsum id felis finibus congue. Ut molesti +e scelerisque purus, sit amet rhoncus leo aliquet ac. In eu lobortis quam. Maecenas auctor + semper enim, ut convallis sapien dictum eu. Sed arcu ex, ornare et porttitor vitae +, interdum a mi. Mauris rutrum luctus rhoncus. Quisque velit quam, convallis vel est a +t, tincidunt accumsan velit. Fusce ut <u>metus ut which may either exceed \$1,000.00 or OK. G. LAT +, semper nunc, in dictum magna.</u> Aliquam ac vestibulum dolor. Praesent in magna nisi. Cras nec viverra + ligula. Suspendisse efficitur imperdiet eros, <u>XXsed rhoncus sapien euismod cursus. Ves +tibulum a posuereYY</u> elit, eget tristique eros. Etiam et lectus venenatis, aliquet dui vitae, po +suere lectus."); #while (defined( my $lines = shift @lines)){ #foreach my $lines (@lines){ for my $lines (@lines){ if( $lines =~ /<u>(.*?)<\/u>/sg ){ print "\n $1\n"; } }
        Why this code can't get the second match ...

        Because the  /g modifier in scalar context (which is supplied by evaluating the match as part of an  if or  while condition expression) will cause an  m//g match to match only once per evaluation. The  if block only executes once if the condition is true. The  while block continues to execute until the conditional is no longer true.

        c:\@Work\Perl\monks>perl -wMstrict -le "my $s = qq{foo <u> match \n the first </u> bar <u> second \n match </ +u> baz}; print qq{[[$s]] \n}; ;; if ($s =~ m{ <u> (.*?) </u> }xmsg) { print qq{if: '$1'}; } print ''; ;; pos $s = 0; while ($s =~ m{ <u> (.*?) </u> }xmsg) { print qq{while: '$1'}; } " [[foo <u> match the first </u> bar <u> second match </u> baz]] if: ' match the first ' while: ' match the first ' while: ' second match '

        Note: The  pos $s = 0; statement is needed in the example because each string keeps track of its own match position, and that match position is used in  /g matching. Try eliminating the statement from the code and see what happens. Also try printing pos at various strategic points in execution.

        Update: Also look in Regexp Quote-Like Operators in perlop for discussion of  m/PATTERN/msixpodualgc and look for the phrase 'In scalar context, each execution of "m//g" ...'     (Update: See also Global matching in perlretut.)


        Give a man a fish:  <%-{-{-{-<

Re: Matching text between tags
by GrandFather (Saint) on Apr 15, 2016 at 01:14 UTC

    For stuff that looks like HTML, use an HTML parser:

    #!/usr/bin/perl use strict; use warnings; use HTML::TreeBuilder; my $lines = <<'LINES'; Aliquam vitae ipsum id felis finibus congue. Ut molestie scelerisque p +urus, sit amet rhoncus leo aliquet ac. In eu lobortis quam. Maecenas auctor +semper enim, ut convallis sapien dictum eu. Sed arcu ex, ornare et porttitor vitae, + interdum a mi. Mauris rutrum luctus rhoncus. Quisque velit quam, convallis vel est at +, tincidunt accumsan velit. Fusce ut <u>metus ut which may either exceed $1,000.00 or OK. G. LAT, semper nunc, in dictum magna.</u> Aliquam ac vestibulum dolor. Praesent in magna nisi. Cras nec viverra +ligula. Suspendisse efficitur imperdiet eros, <u>sed rhoncus sapien euismod cursus. Vestib +ulum a posuere</u> elit, eget tristique eros. Etiam et lectus venenatis, aliquet dui vitae, pos +uere lectus."; LINES my $tree = HTML::TreeBuilder->new_from_content($lines); for my $node ($tree->guts()) { next if !ref $node || $node->tag() ne 'u'; print $node->as_text(), "\n\n"; }

    Prints:

    metus ut which may either exceed $1,000.00 or OK. G. LAT, semper nunc, + in dictum magna. sed rhoncus sapien euismod cursus. Vestibulum a posuere
    Premature optimization is the root of all job security
Re: Matching text between tags
by LanX (Saint) on Apr 14, 2016 at 15:16 UTC
    Can't test ( being mobile) but where is the loop? Did you post the whole code?

    Please try replacing if with while .

    edit

    Or use list context @underlined= $text =~ //gs

    Btw the /s will help . matching newlines.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      Sorry, it matches, but only the first <u>...</u> There are two lines in the sample text.
      foreach my $lines (@lines){ if( $lines =~ /<u>(.*?)<\/u>/igs ){ print "\n $1\n"; } }
        Did you try what I suggested?

        if has scalar context and will only match once per line!

        The /g will only memorize the position fur the consecutive match.

        (i.e. two ifs will match twice and so on)

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1160410]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (9)
As of 2024-03-28 09:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found