http://www.perlmonks.org?node_id=650671

new@perl has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,
Am completely new at perl programming;
I am trying to count the occurence of a string "abc" in a flat file (data.txt) of following structure and return the count and the matching lines.

_data.txt_ abc:AB CD:100 def:DE FG:101 ghi:GH IJ:102 abc:AB CD:100 ghi:GH IJ:103

I am able to grep the matching lines for string "abc" using the below code, although i read that using grep is not advisable for huge files.

my $file='source_data\data.txt'; open FILE, $file or die "FILE $file NOT FOUND - $!\n"; while (<FILE>){ chop $_; my ($a, $b, $c) = split /:/, $_; my $r = [$a, $b, $c]; my @arr = $r->[0]; my @matches = grep $_ eq "abc", $arr[0]; foreach my $i (@matches) { print $_."\n"; } }
_ouput_ abc:AB CD:100 abc:AB CD:100

How can I get the count of matching lines?
Seeking your kind advise.
--new@perl

Replies are listed 'Best First'.
Re: find a string and count of its occurence in a text file
by GrandFather (Saint) on Nov 14, 2007 at 05:29 UTC

    There is a lot needless code in your sample, and a couple of foibles. Assuming you want to do something other than just printing out the contents of the matching line, something like the following sample may be what you are after:

    use strict; use warnings; my $fileContent = <<DATA; abc:AB CD:100 def:DE FG:101 ghi:GH IJ:102 abc:AB CD:100 ghi:GH IJ:103 DATA open FILE, '<', \$fileContent; while (<FILE>) { chomp; # chomp not chop. $_ is default so omit my @elements = split /:/, $_; next unless $elements[0] eq 'abc'; print "Matched: ", join ('|', @elements), "\n"; } close FILE;

    Prints:

    Matched: abc|AB CD|100 Matched: abc|AB CD|100

    Note that for purposes of the sample the "file" is actually just a string, although Perl allows it to be opened and manipulated as a file.

    Generally it is a bad idea to rely on the contents of $_ remaining unaltered over more than a couple of lines of code. You are better to use an explicit variable in such cases so that the intent of the code is clearer and so that the value doesn't get altered in unexpected ways.

    Use the three parameter open to make intent clearer and use safe (what happens if the file name starts with '>' in your sample?).

    grep on a single element can be replaced with an if.


    Perl is environmentally friendly - it saves trees
Re: find a string and count of its occurence in a text file
by ysth (Canon) on Nov 14, 2007 at 06:19 UTC
    Do not do this:
    $/="abc"; my $count = chomp(@/=<FILE>)/$/=~y///c;
      So I ran a perldoc chomp and saw this (in a page that is a good read in its entirety):

      If you chomp a list, each element is chomped, and the total number of characters removed is returned.

      and then I ran perldoc transliterate, searched inside the page for "Transliterates" and saw these:

      • Options: c => Complement the SEARCHLIST.
      • It returns the number of characters replaced or deleted

      I figure, @/ is an ordinary array, just like @records, say. We already have a glob, */, and we are even using its special-purpose scalar portion in this example, so why not use its array slot, too? I wondered whether I could use chomp(()=<FILE>) instead, but no, it doesn't work. The assignment to the empty list probably succeeds in executing <FILE> in list context, but then throws away the results and does not provide chomp() an lvalue to work with.

      The $/ =~ y///c bit, then, counts the number of characters in $/, the input-record-separator, by replacing everything, all chars in the input-record-separator (here described as the complement of nothing) with nothing and returning the number of chars thus replaced. You could just replace that whole expression with 3 in this particular case, as that's the number of characters in "abc", the value of the input-record-separator, but the counting makes the code portable to other input-record-separators.

      The program is probably memory-hungry. Although it is not in slurp mode, all the lines seem to get stored in the @/ array, before chomp(LIST) has a chance to work on them.

      In any case, chomp() cuts off all trailing occurrences of "abc" in @/ and returns the number of chars it thus cut off. Dividing that by three (that is, by $/=~y///c) then gives you how many times "abc" occurs in the file.

      Is that right, monks?

        The $/ =~ y///c bit, then, counts the number of characters in $/, the input-record-separator, by replacing everything, all chars in the input-record-separator (here described as the complement of nothing) with nothing and returning the number of chars thus replaced.
        Not quite; since the REPLACEMENTLIST defaults to the (post-complementing) SEARCHLIST (except with /d), all the chars are replaced with themselves, not nothing. So $/ is unchanged. (Actually, tr aka y recognizes when it's only being used to count and can even be used on readonly strings then.)
Re: find a string and count of its occurence in a text file
by narainhere (Monk) on Nov 14, 2007 at 05:27 UTC
    This would do the trick
    use strict; use warnings; sub retriver(); my @lines; my $lines_ref; my $count; $lines_ref=retriver(); @lines=@$lines_ref; $count=@lines; print "Count :$count\nLines\n"; print join "\n",@lines; sub retriver() { my $file='source_data\data.txt'; open FILE, $file or die "FILE $file NOT FOUND - $!\n"; my @contents=<FILE>; my @filtered=grep(/abc:/,@contents); return \@filtered; }

    The world is so big for any individual to conquer

      Thanks a lot narainhere :-) i need some explanation though, if you please.

      what is the function of -->
      sub retriver() in line number 3;

        It's function prototyping.That's needed because retriver() is not defined while it's been called.If you remove the prototyping (line 3) you have to use &retriver() while calling the function ,which tells the compiler to look for the definition somewhere below.

        The world is so big for any individual to conquer

Re: find a string and count of its occurence in a text file
by oha (Friar) on Nov 14, 2007 at 10:26 UTC
    As you did, you must open the file, then for every line of file you must check if starts with abc, then you can increment a variable or print out what you need.
    open FILE, $file or die "can't open $file: $!\n"; while(<FILE>) { next unless /^abc:/; $counter++; chomp; print "$line\n"; # whatever you need } close FILE;
    Doing this way you will never load all the file lines but parse one by one.
    as someone noticed grep will work on arrays, so to use it you must load all the lines in one array @array = <FILE> which lead to memory issues if the file is big.

    Oha

    PS: perl have the poetry of next unless, which is so beauty instead of if(! COND) { continue } I can't avoid posting it! :)

Re: find a string and count of its occurence in a text file
by Anonymous Monk on Nov 14, 2007 at 05:38 UTC
    i read that using grep is not advisable for huge files

    Someone lied to you.

    You probably want something like this:

    while ( <FILE> ) { print if 'abc' eq ( split /:/ )[ 0 ]; }

    Or possibly this:

    while ( <FILE> ) { print if /^abc:/; }
      And to chuck in an (incredibly primitive) idea for a line count as well:
      _data.txt_ abc:AB CD:100 def:DE FG:101 ghi:GH IJ:102 abc:AB CD:100 ghi:GH IJ:103 my $count; while ( <FILE> ) { print if /^abc:/; $count++ if /^abc/; } print "Matched $count times\n"; _output_ abc:AB CD:100 abc:AB CD:100 Matched 2 times
      Update: I'm silly; added condition for incrementing $count.
      i read that using grep is not advisable for huge files

      Someone lied to you.


      Maybe lied is a bit strong, they we probably thinking of this:
      grep /fred/,<FILE>;
      which loads the entire file into memory (or at least, tries to).
        I thought he meant grep(1) instead of perldoc -f grep    :-)
Re: find a string and count of its occurence in a text file
by TheForeigner (Initiate) on Nov 14, 2007 at 15:22 UTC
    It looks like you have plenty to work with, but here's my solution:
    open my $file, 'in.txt' || die "Couldn't open file: $!\n"; #better +way to make file handles foreach (<$file>){ #for each line push @matches, $_ if (/abc/); #keep them in an array if they ma +tch } print @matches; #print the matches print "Total matches: ",scalar(@matches),"\n"; #print the number of + matches
Re: find a string and count of its occurence in a text file
by sundialsvc4 (Abbot) on Nov 15, 2007 at 03:28 UTC

    Do not overlook any opportunity for using all of the tools that may be available to you. For instance, this particular requirement might be easily met by awk, without the use of any Perl programming at all!

    And if that be the case... "cool!"

      For instance, this particular requirement might be easily met by awk

      In fact, there's a general Linux recipe for just these things, generally introduced right around the time whatever book/tute/etc. decides to introduce pipes:

      $ egrep ^abc <filename> | wc -l

      Disclaim: This wouldn't work if OP had, say, wanted to find the total number of occurrences of a given string that happened to occur more than once per line in data; as it is, OP doesn't (or at least, data doesn't contain the sort of case that would prevent this working) :)

Re: find a string and count of its occurence in a text file
by arasu (Initiate) on Apr 23, 2012 at 11:21 UTC
    open FH, "inputDatafile.txt"; $/=""; ## input field separator my $line = <FH>; close (FH); $count = $line =~ s/(abc)/$1/g; print "count is : $count\n";
    "There is a solution for all problems but we need to find a direction"