Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
"be consistent"
 
PerlMonks  

Repeating the same command in different portions of input

by albascura (Novice)
on Jan 15, 2013 at 10:06 UTC ( #1013342=perlquestion: print w/ replies, xml ) Need Help??
albascura has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone. I'm fairly new to perl and programming in general, and I don't understand how to do the following. If you would be kind enough to explain it to me, I would be really grateful.

I searched around the forum but I didn't find it, so if you could point to some analogous thread it will be fine.

Basically I have a text input like the following:

<a> word1 word2 word3 </a> <a> word4 word5 </a> <a> word6 word7 </a>

What I would like to do, is to read the file, but analyze any different <a></a> block apart from the other. Basically, I have to perform some kind of analysis on

<a> word1 word2 word3 </a>
and then do the very same analysis on
<a> word4 word5 </a>
and
<a> word6 word7 </a>

I have absolutely no clue on how to do it, I confess. So any kind of help (readings, suggestions, any other topics referring to similar problems) would be really appreciated.

Comment on Repeating the same command in different portions of input
Select or Download Code
Re: Repeating the same command in different portions of input
by Anonymous Monk on Jan 15, 2013 at 10:35 UTC
      My level of knowledge is close to 0. But I'm reading the material you provided. And I want to thank you!
Re: Repeating the same command in different portions of input
by tobyink (Abbot) on Jan 15, 2013 at 10:37 UTC

    What you want is a loop. Perl has several different kinds of loop - for, foreach, while, until, etc. They all boil down to the same thing, but depending on what you're doing, one type of loop might be more convenient than then other.

    use v5.12; use strict; use warnings; # Obscure but easy way of reading a file into a string to match agains +t... $_ = do { local (@ARGV, $/) = 'loop.txt'; <> }; # Extract text we're interested in... my @matches = m{<a>(.+?)</a>}sg; # Loop through the matches... foreach my $match (@matches) { # This block gets executed for each match! say "GOT: $match"; }
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      Thanks!
Re: Repeating the same command in different portions of input
by aitap (Chaplain) on Jan 15, 2013 at 12:38 UTC
Re: Repeating the same command in different portions of input
by Kenosis (Priest) on Jan 15, 2013 at 16:07 UTC

    tobyink has provided an excellent solution. For your future reference--and in case the need arises again--there are Perl modules that can be used for parsing the kind of text you have. Here's an example that uses Mojo::DOM to parse your <a> tags:

    use strict; use warnings; use Mojo::DOM; my $text = <<END; <a> word1 word2 word3 </a> <a> word4 word5 </a> <a> word6 word7 </a> END my $dom = Mojo::DOM->new($text); my $i = 1; for my $chunk ( $dom->find('a')->each ) { print 'Chunk ' . $i++ . ': ' . $chunk->text . "\n"; }

    Output:

    Chunk 1: word1 word2 word3 Chunk 2: word4 word5 Chunk 3: word6 word7

    Thus, each group that you need to analyze is contained by $chunk->text within the for loop.

    Hope this helps!

      It really helps, thanks.

      I was wondering. I see that $chunk->text doesn't preserve the new line at the end of each word. Since I need to check stuff that are in lines (I did simply my code a little in the previous example) I was wondering if I could do something like these:

      for my $chunk ( $dom->find('s')->each ) { my @values = split('\n', $chunk); foreach $line (@values) { do stuff on every line } }

      I'm trying it right now. I hope it works.

      Thanks again!

        Yes, splitting the 'chunk' is a good solution! However, since you've noticed the chunk lacks newlines, change:

        my @values = split('\n', $chunk);

        to

        my @values = split /\s+/, $chunk;
        • This splits on whitespace
        • It uses a regex, not a string literal (also, '\n' would not be interpolated into a newline since you've used single quotes)
        • Parentheses are optional

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1013342]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (11)
As of 2014-04-17 18:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (453 votes), past polls