find a string and count of its occurence in a text file

new@perl has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,
Am completely new at perl programming;
I am trying to count the occurence of a string "abc" in a flat file (data.txt) of following structure and return the count and the matching lines.

_data.txt_
abc:AB CD:100
def:DE FG:101
ghi:GH IJ:102
abc:AB CD:100
ghi:GH IJ:103
[download]

I am able to grep the matching lines for string "abc" using the below code, although i read that using grep is not advisable for huge files.

my $file='source_data\data.txt';
open FILE, $file or die "FILE $file NOT FOUND - $!\n";

while (<FILE>){
        chop $_;
        my ($a, $b, $c) = split /:/, $_;
        my $r = [$a, $b, $c];
        my @arr = $r->[0];
       
        my @matches = grep $_ eq "abc", $arr[0];
        
        foreach my $i (@matches) {
        print $_."\n";
        }
}
[download]

_ouput_
abc:AB CD:100
abc:AB CD:100
[download]

How can I get the count of matching lines?
Seeking your kind advise.
--new@perl

Comment on find a string and count of its occurence in a text file Select or Download Code

Replies are listed 'Best First'.
Re: find a string and count of its occurence in a text file by GrandFather (Saint) on Nov 14, 2007 at 05:29 UTC
There is a lot needless code in your sample, and a couple of foibles. Assuming you want to do something other than just printing out the contents of the matching line, something like the following sample may be what you are after: `use strict; use warnings; my $fileContent = <<DATA; abc:AB CD:100 def:DE FG:101 ghi:GH IJ:102 abc:AB CD:100 ghi:GH IJ:103 DATA open FILE, '<', \$fileContent; while (<FILE>) { chomp; # chomp not chop. $_ is default so omit my @elements = split /:/, $_; next unless $elements[0] eq 'abc'; print "Matched: ", join ('\|', @elements), "\n"; } close FILE;` [download] Prints: `Matched: abc\|AB CD\|100 Matched: abc\|AB CD\|100` [download] Note that for purposes of the sample the "file" is actually just a string, although Perl allows it to be opened and manipulated as a file. Generally it is a bad idea to rely on the contents of $_ remaining unaltered over more than a couple of lines of code. You are better to use an explicit variable in such cases so that the intent of the code is clearer and so that the value doesn't get altered in unexpected ways. Use the three parameter open to make intent clearer and use safe (what happens if the file name starts with '>' in your sample?). grep on a single element can be replaced with an if. Perl is environmentally friendly - it saves trees	[reply] [d/l] [select]
Re: find a string and count of its occurence in a text file by ysth (Canon) on Nov 14, 2007 at 06:19 UTC
Do not do this: `$/="abc"; my $count = chomp(@/=<FILE>)/$/=~y///c;` [download]	[reply] [d/l]
Re^2: find a string and count of its occurence in a text file by fenLisesi (Priest) on Nov 14, 2007 at 12:20 UTC
So I ran a perldoc chomp and saw this (in a page that is a good read in its entirety): If you chomp a list, each element is chomped, and the total number of characters removed is returned. and then I ran perldoc transliterate, searched inside the page for "`Transliterates`" and saw these: Options: c => Complement the SEARCHLIST. It returns the number of characters replaced or deleted I figure, `@/` is an ordinary array, just like `@records`, say. We already have a glob, `*/`, and we are even using its special-purpose scalar portion in this example, so why not use its array slot, too? I wondered whether I could use `chomp(()=<FILE>)` instead, but no, it doesn't work. The assignment to the empty list probably succeeds in executing `<FILE>` in list context, but then throws away the results and does not provide `chomp()` an lvalue to work with. The `$/ =~ y///c` bit, then, counts the number of characters in `$/`, the input-record-separator, by replacing everything, all chars in the input-record-separator (here described as the complement of nothing) with nothing and returning the number of chars thus replaced. You could just replace that whole expression with `3` in this particular case, as that's the number of characters in `"abc"`, the value of the input-record-separator, but the counting makes the code portable to other input-record-separators. The program is probably memory-hungry. Although it is not in slurp mode, all the lines seem to get stored in the `@/` array, before `chomp(LIST)` has a chance to work on them. In any case, `chomp()` cuts off all trailing occurrences of `"abc"` in `@/` and returns the number of chars it thus cut off. Dividing that by three (that is, by `$/=~y///c`) then gives you how many times `"abc"` occurs in the file. Is that right, monks?	[reply] [d/l] [select]
Re^3: find a string and count of its occurence in a text file by ysth (Canon) on Nov 15, 2007 at 06:49 UTC
The `$/ =~ y///c` bit, then, counts the number of characters in `$/`, the input-record-separator, by replacing everything, all chars in the input-record-separator (here described as the complement of nothing) with nothing and returning the number of chars thus replaced. Not quite; since the REPLACEMENTLIST defaults to the (post-complementing) SEARCHLIST (except with /d), all the chars are replaced with themselves, not nothing. So $/ is unchanged. (Actually, tr aka y recognizes when it's only being used to count and can even be used on readonly strings then.)	[reply] [d/l] [select]
Re: find a string and count of its occurence in a text file by narainhere (Monk) on Nov 14, 2007 at 05:27 UTC
This would do the trick `use strict; use warnings; sub retriver(); my @lines; my $lines_ref; my $count; $lines_ref=retriver(); @lines=@$lines_ref; $count=@lines; print "Count :$count\nLines\n"; print join "\n",@lines; sub retriver() { my $file='source_data\data.txt'; open FILE, $file or die "FILE $file NOT FOUND - $!\n"; my @contents=<FILE>; my @filtered=grep(/abc:/,@contents); return \@filtered; }` [download] The world is so big for any individual to conquer	[reply] [d/l]
Re^2: find a string and count of its occurence in a text file by new@perl (Initiate) on Nov 14, 2007 at 05:59 UTC
Thanks a lot narainhere :-) i need some explanation though, if you please. what is the function of --> sub retriver() in line number 3;	[reply]
Re^3: find a string and count of its occurence in a text file by narainhere (Monk) on Nov 14, 2007 at 06:51 UTC
It's function prototyping.That's needed because `retriver()` is not defined while it's been called.If you remove the prototyping (line 3) you have to use `&retriver()` while calling the function ,which tells the compiler to look for the definition somewhere below. The world is so big for any individual to conquer	[reply] [d/l] [select]
Re^4: find a string and count of its occurence in a text file by Anonymous Monk on Nov 14, 2007 at 12:58 UTC
Re: find a string and count of its occurence in a text file by oha (Friar) on Nov 14, 2007 at 10:26 UTC
As you did, you must open the file, then for every line of file you must check if starts with abc, then you can increment a variable or print out what you need. `open FILE, $file or die "can't open $file: $!\n"; while(<FILE>) { next unless /^abc:/; $counter++; chomp; print "$line\n"; # whatever you need } close FILE;` [download] Doing this way you will never load all the file lines but parse one by one. as someone noticed grep will work on arrays, so to use it you must load all the lines in one array `@array = <FILE>` which lead to memory issues if the file is big. Oha PS: perl have the poetry of `next unless`, which is so beauty instead of `if(! COND) { continue }` I can't avoid posting it! :)	[reply] [d/l] [select]
Re: find a string and count of its occurence in a text file by Anonymous Monk on Nov 14, 2007 at 05:38 UTC
i read that using grep is not advisable for huge files Someone lied to you. You probably want something like this: `while ( <FILE> ) { print if 'abc' eq ( split /:/ )[ 0 ]; }` [download] Or possibly this: `while ( <FILE> ) { print if /^abc:/; }` [download]	[reply] [d/l] [select]
Re^2: find a string and count of its occurence in a text file by dwu (Monk) on Nov 14, 2007 at 05:53 UTC
And to chuck in an (incredibly primitive) idea for a line count as well: `_data.txt_ abc:AB CD:100 def:DE FG:101 ghi:GH IJ:102 abc:AB CD:100 ghi:GH IJ:103 my $count; while ( <FILE> ) { print if /^abc:/; $count++ if /^abc/; } print "Matched $count times\n"; _output_ abc:AB CD:100 abc:AB CD:100 Matched 2 times` [download] Update: I'm silly; added condition for incrementing $count.	[reply] [d/l]
Re^2: find a string and count of its occurence in a text file by cdarke (Prior) on Nov 14, 2007 at 09:27 UTC
i read that using grep is not advisable for huge files Someone lied to you. Maybe lied is a bit strong, they we probably thinking of this: `grep /fred/,<FILE>;` [download] which loads the entire file into memory (or at least, tries to).	[reply] [d/l]
Re^3: find a string and count of its occurence in a text file by Anonymous Monk on Nov 14, 2007 at 18:44 UTC
I thought he meant `grep(1)` instead of `perldoc -f grep` `:-)`	[reply] [d/l] [select]
Re: find a string and count of its occurence in a text file by TheForeigner (Initiate) on Nov 14, 2007 at 15:22 UTC
It looks like you have plenty to work with, but here's my solution: `open my $file, 'in.txt' \|\| die "Couldn't open file: $!\n"; #better +way to make file handles foreach (<$file>){ #for each line push @matches, $_ if (/abc/); #keep them in an array if they ma +tch } print @matches; #print the matches print "Total matches: ",scalar(@matches),"\n"; #print the number of + matches` [download]	[reply] [d/l]
Re: find a string and count of its occurence in a text file by sundialsvc4 (Abbot) on Nov 15, 2007 at 03:28 UTC
Do not overlook any opportunity for using all of the tools that may be available to you. For instance, this particular requirement might be easily met by `awk`, without the use of any Perl programming at all! And if that be the case... "cool!"	[reply]
Re^2: find a string and count of its occurence in a text file by dwu (Monk) on Nov 15, 2007 at 03:56 UTC
For instance, this particular requirement might be easily met by `awk` In fact, there's a general Linux recipe for just these things, generally introduced right around the time whatever book/tute/etc. decides to introduce pipes: `$ egrep ^abc <filename> \| wc -l` [download] Disclaim: This wouldn't work if OP had, say, wanted to find the total number of occurrences of a given string that happened to occur more than once per line in data; as it is, OP doesn't (or at least, data doesn't contain the sort of case that would prevent this working) :)	[reply] [d/l] [select]
Re: find a string and count of its occurence in a text file by arasu (Initiate) on Apr 23, 2012 at 11:21 UTC
`open FH, "inputDatafile.txt"; $/=""; ## input field separator my $line = <FH>; close (FH); $count = $line =~ s/(abc)/$1/g; print "count is : $count\n";` [download] "There is a solution for all problems but we need to find a direction"	[reply] [d/l]

Back to Seekers of Perl Wisdom

The world is so big for any individual to conquer

The world is so big for any individual to conquer