Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Repeating the same command in different portions of input

by albascura (Novice)
on Jan 15, 2013 at 10:06 UTC ( #1013342=perlquestion: print w/ replies, xml ) Need Help??
albascura has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone. I'm fairly new to perl and programming in general, and I don't understand how to do the following. If you would be kind enough to explain it to me, I would be really grateful.

I searched around the forum but I didn't find it, so if you could point to some analogous thread it will be fine.

Basically I have a text input like the following:

<a> word1 word2 word3 </a> <a> word4 word5 </a> <a> word6 word7 </a>

What I would like to do, is to read the file, but analyze any different <a></a> block apart from the other. Basically, I have to perform some kind of analysis on

<a> word1 word2 word3 </a>
and then do the very same analysis on
<a> word4 word5 </a>
and
<a> word6 word7 </a>

I have absolutely no clue on how to do it, I confess. So any kind of help (readings, suggestions, any other topics referring to similar problems) would be really appreciated.

Replies are listed 'Best First'.
Re: Repeating the same command in different portions of input
by tobyink (Abbot) on Jan 15, 2013 at 10:37 UTC

    What you want is a loop. Perl has several different kinds of loop - for, foreach, while, until, etc. They all boil down to the same thing, but depending on what you're doing, one type of loop might be more convenient than then other.

    use v5.12; use strict; use warnings; # Obscure but easy way of reading a file into a string to match agains +t... $_ = do { local (@ARGV, $/) = 'loop.txt'; <> }; # Extract text we're interested in... my @matches = m{<a>(.+?)</a>}sg; # Loop through the matches... foreach my $match (@matches) { # This block gets executed for each match! say "GOT: $match"; }
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      Thanks!
Re: Repeating the same command in different portions of input
by aitap (Deacon) on Jan 15, 2013 at 12:38 UTC
Re: Repeating the same command in different portions of input
by Anonymous Monk on Jan 15, 2013 at 10:35 UTC
      My level of knowledge is close to 0. But I'm reading the material you provided. And I want to thank you!
Re: Repeating the same command in different portions of input
by Kenosis (Priest) on Jan 15, 2013 at 16:07 UTC

    tobyink has provided an excellent solution. For your future reference--and in case the need arises again--there are Perl modules that can be used for parsing the kind of text you have. Here's an example that uses Mojo::DOM to parse your <a> tags:

    use strict; use warnings; use Mojo::DOM; my $text = <<END; <a> word1 word2 word3 </a> <a> word4 word5 </a> <a> word6 word7 </a> END my $dom = Mojo::DOM->new($text); my $i = 1; for my $chunk ( $dom->find('a')->each ) { print 'Chunk ' . $i++ . ': ' . $chunk->text . "\n"; }

    Output:

    Chunk 1: word1 word2 word3 Chunk 2: word4 word5 Chunk 3: word6 word7

    Thus, each group that you need to analyze is contained by $chunk->text within the for loop.

    Hope this helps!

      It really helps, thanks.

      I was wondering. I see that $chunk->text doesn't preserve the new line at the end of each word. Since I need to check stuff that are in lines (I did simply my code a little in the previous example) I was wondering if I could do something like these:

      for my $chunk ( $dom->find('s')->each ) { my @values = split('\n', $chunk); foreach $line (@values) { do stuff on every line } }

      I'm trying it right now. I hope it works.

      Thanks again!

        Yes, splitting the 'chunk' is a good solution! However, since you've noticed the chunk lacks newlines, change:

        my @values = split('\n', $chunk);

        to

        my @values = split /\s+/, $chunk;
        • This splits on whitespace
        • It uses a regex, not a string literal (also, '\n' would not be interpolated into a newline since you've used single quotes)
        • Parentheses are optional

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1013342]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2016-07-25 22:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What is your favorite alternate name for a (specific) keyboard key?


















    Results (229 votes). Check out past polls.