Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

counting string in a text file

by nurulnad (Acolyte)
on Jan 13, 2011 at 03:20 UTC ( [id://882024]=perlquestion: print w/replies, xml ) Need Help??

nurulnad has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I have some texts that looks like this:

<td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td>

I need to count the number of "green" lines. In the above example, my output should look something like:

First green = line 6 to 10 Second green = line 15 to 17
Previously I've written a code that counts the occurrence of "green" if it occurs only once. My code is a bit silly, and I can't really explain it without being wordy, so I'll just put it here.
#!/usr/bin/perl use warnings; open (GREEN,"data.txt"); open (WHITE,"data.txt"); $/ = "</td>"; $counter_white = 0; $counter_green = 0; #----------------count number of green--------------------- while ($line_green = <GREEN>) { if ($line_green =~ /<td bgcolor="green" height="10"><\/td>/){$coun +ter_green++;} } #------count number of white (before any green occur)------------- while ($line_white = <WHITE>) { if ($line_white =~ /<td bgcolor="white"><\/td>/){$counter_white++; +} if ($line_white =~ /<td bgcolor="green" height="10"><\/td>/){last; +} # escape when green starts } #--------------------------------------- print "white = ".$counter_white."\n"; print "green = ".$counter_green."\n"; $beginning = $counter_white + 1; $end = $counter_white + $counter_green; print $beginning."\n"; # start of green print $end."\n"; # end of green print "The result starts from ".$beginning." to ".$end."\n"; close (GREEN); close (WHITE);

Replies are listed 'Best First'.
Re: counting string in a text file
by ELISHEVA (Prior) on Jan 13, 2011 at 04:58 UTC

    This is homework? (you would never have <td> tags in isolation like this in a real HTML file since they are part of a larger table definition). A few hints to get you back on track:

    • You don't need two separate filehandles to print the start and length of the first green run and the first white run. In fact it is impossible to do with two separate streams, one only counting white and one only counting green, because you need to know when the green stops and the white begins and vice versa. That requires a single sequence of lines so you know when you've changed from green to white and back again.

    • It is all in the variables. Try thinking about how you could find the start of a color run and count its length with one file handle and the following variables: $startOfRunLineNumber, $currentColor, and $currentLineNumber.

    • What changes to tell you that it is the end of the run? The color? Of course. So inside your loop you need an if...elsif...else or if...else that checks the color of the current line against the stored value of $currentColor set while reading the previous line.

    • Please get in the habit of using the three parameter open:open MYDATA, "<", "data.txt", where the open mode (parameter 2) is separate from the file name (parameter ). It pays to be explicit about what you want Perl to do with the file. In the two parameter open, Perl has to make guesses about whether the string contains a file name, an open mode, or both. If the guess is wrong , you could get burned.

    • If this is not an exercise and you are parsing live HTML, then please, please, please learn how to use an HTML parser package like HTML::Parser. Normal HTML files can't be reliably parsed using regular expressions due to its repetitive nested nature.

Re: counting string in a text file
by jwkrahn (Abbot) on Jan 13, 2011 at 07:30 UTC
    $ echo '<td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> ' | perl -e' while ( <> ) { $start = $. if ?"green"?; if ( !/"green"/ && $start ) { print "$start-", $. - 1, "\n"; reset; $start = undef; } } ' 6-10 15-16
Re: counting string in a text file
by eff_i_g (Curate) on Jan 13, 2011 at 05:09 UTC
    use warnings; use strict; use Set::IntSpan; my %color; while (<DATA>) { push @{$color{$1}}, $. if /\sbgcolor="([^"]+)"/; } for (sort keys %color) { my $span = Set::IntSpan->new(@{$color{$_}}); print "$_: ", $span->run_list, "\n"; } __DATA__ <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="green" height="10"></td> <td bgcolor="green" height="10"></td> <td bgcolor="white"></td> <td bgcolor="white"></td> <td bgcolor="white"></td>
    green: 6-10,15-16 white: 1-5,11-14,17-19
Re: counting string in a text file
by PeterPeiGuo (Hermit) on Jan 13, 2011 at 04:49 UTC

    You don't need to count. All you need is to find out the boundaries - where color changes. Loop through the file and store boundaries in, for example, a hash. In that hash, use line number as key, and the color as value. This way your code does not have dependency on the number of colors and it is more extensible.

    Peter (Guo) Pei

Re: counting string in a text file
by sundialsvc4 (Abbot) on Jan 13, 2011 at 13:10 UTC

    A most-engineered solution would use an HTML parser to create an XML tree, and would use XPath expressions to pluck what you needed out of that tree.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://882024]
Approved by ELISHEVA
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-04-23 18:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found