Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Repeating the same command in different portions of input

by Kenosis (Priest)
on Jan 15, 2013 at 16:07 UTC ( [id://1013419]=note: print w/replies, xml ) Need Help??


in reply to Repeating the same command in different portions of input

tobyink has provided an excellent solution. For your future reference--and in case the need arises again--there are Perl modules that can be used for parsing the kind of text you have. Here's an example that uses Mojo::DOM to parse your <a> tags:

use strict; use warnings; use Mojo::DOM; my $text = <<END; <a> word1 word2 word3 </a> <a> word4 word5 </a> <a> word6 word7 </a> END my $dom = Mojo::DOM->new($text); my $i = 1; for my $chunk ( $dom->find('a')->each ) { print 'Chunk ' . $i++ . ': ' . $chunk->text . "\n"; }

Output:

Chunk 1: word1 word2 word3 Chunk 2: word4 word5 Chunk 3: word6 word7

Thus, each group that you need to analyze is contained by $chunk->text within the for loop.

Hope this helps!

Replies are listed 'Best First'.
Re^2: Repeating the same command in different portions of input
by albascura (Novice) on Jan 15, 2013 at 20:50 UTC

    It really helps, thanks.

    I was wondering. I see that $chunk->text doesn't preserve the new line at the end of each word. Since I need to check stuff that are in lines (I did simply my code a little in the previous example) I was wondering if I could do something like these:

    for my $chunk ( $dom->find('s')->each ) { my @values = split('\n', $chunk); foreach $line (@values) { do stuff on every line } }

    I'm trying it right now. I hope it works.

    Thanks again!

      Yes, splitting the 'chunk' is a good solution! However, since you've noticed the chunk lacks newlines, change:

      my @values = split('\n', $chunk);

      to

      my @values = split /\s+/, $chunk;
      • This splits on whitespace
      • It uses a regex, not a string literal (also, '\n' would not be interpolated into a newline since you've used single quotes)
      • Parentheses are optional

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1013419]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-19 21:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found