Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Reading through a file and checking for a specific string

by vihar (Acolyte)
on Aug 19, 2013 at 19:42 UTC ( [id://1050075]=perlquestion: print w/replies, xml ) Need Help??

vihar has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I am pretty new to coding and Perl. I am not sure what I am doing wrong in this code.

My goal for this script is to read through text files line by line and check if a match is found at a certain place in line. If a match is found, update count and then print it out later.

The first part is where I am storing all files names in an array that I am supposed to be reading.

I have array1 - @types that has these 2 letter strings(which I am supposed to check for in each file in each line, This variable is found in each line at 41st character.) and array2 - @counts where I am trying to store count if that match is found.

Then I am initializing all elements in array2 to 0 since they are all going to be integers. Then I have code to go though each file, each line and check for a match.

use Data::Dumper; use List::Util qw(sum); # Grab text files from archive directory with glob function @files = glob ('/export/home/date_file*); $arrCount = scalar(@files); my @types = ("AB", "AC", "AD", "AE", "FG"); my @counts = ($AB, $AC, $AD, $AE, $FG); for($i = 0; $i < (@counts) ; $i++ ) { @counts[$i] = 0; } for($i = 0; $i < $arrCount; $i++) { $file = @files[$i]; open(FILE, $file) or die "Can't open `$file': $!"; @lines = <FILE>; close FILE; foreach $line (@lines) { $str = $line; $var = substr($str, 41, 2); for( $i=0; $i<(@types); $i++ ) { if ( $var eq "@types[$i]" ){ @counts[$i]++; } } } } my $sum = 0; for ( @counts) { $sum += $_; } for( $j=0; $j<(@types); $j++ ) { print "@types[$j]\t: @sums[$j] \n"; $j = $j + 1; } print "Total \t: $sum";


This is what my output should look like(with counts next to each)
AB :
AC :
AD :
AE :
FG :
Total :


I am pretty sure there is something wrong with the logic and I can't figure it out. Please help. Thanks in advance!

Replies are listed 'Best First'.
Re: Reading through a file and checking for a specific string
by roboticus (Chancellor) on Aug 19, 2013 at 19:49 UTC

    vihar:

    I'd suggest putting in a print statement to show the values as you process them. For example, just after extracting your small string, you could print the line number, the bit you extracted and the complete line:

    $var = substr($str, 41, 2); print "$.: $var: $str\n";

    Then make a small test file, and run your code against the small test file to see if you get what you're looking for. That ought to help you find the problem in short order.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Thanks for your reply. That was the first thing I did to really figure out what's going in but couldn't get anywhere with it. It was printing out a bunch of values and they were not correct.
Re: Reading through a file and checking for a specific string
by toolic (Bishop) on Aug 19, 2013 at 20:02 UTC
    The code you posted doesn't compile because you are missing some semicolons. Please update your post with your actual code.
    # Grab files from archive directory with glob function @files = "/export/home/date_file*.txt";
    You are not using glob here. @files is an array with one element: /export/home/date_file*.txt. Prove this to yourself (Tip #4 from the Basic debugging checklist).
      Sorry about that. There were a couple lines I added manually while posting this question. That's why.
Re: Reading through a file and checking for a specific string
by 2teez (Vicar) on Aug 19, 2013 at 20:16 UTC

    Scanning through your codes, there are several stuff, that is wrong.
    Please,

    use warnings; use strict;
    to start with, then you will discovery that several of your codes didn't end with a ;.
    You said you are using a glob function, but their is non in your code, so you only have the string "/export/home/date_file*.txt", in your array variable "@files".
    Update:
    I have array1 - @types that has these 2 letter strings(which I am supposed to check for in each file in each line, This variable is found in each line at 41st character.) and array2 - @counts where I am trying to store count if that match is found.
    Instead of using two arrays, why don't you use a HASH, with the type of strings you would be looking for as the keys and initialized them to zero like so:
    my %type; @type{qw(AB AC AD AE FG)} = (0) x 5;
    ##then you have something like this: $VAR1 = { 'AC' => 0, 'AE' => 0, 'FG' => 0, 'AB' => 0, 'AD' => 0 };
    then later you can do... *
    while(...){ ... $type{$_}++; ## increasing the counting as you see needed string ... }
    *NOTE:
    I don't except the pseudo-code to work, since the OP didn't show any dataset.

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
      Hi, I just updated it in my code. There were a few lines I added manually when I was writing this question. I forgot to put semi colons and glob function but I did that now. Thanks

        Yes, you have, but there are still some there.
        You still have a single quote without it's corresponding part here

        @files = glob ('/export/home/date_file*); ^ ^<-- not there
        Then instead of these
         $counts[$i] ...
        $files[$i]..
        $counts[$i]++..
        you are writing  @counts[$i] ...
        @files[$i]..
        @counts[$i]++.. Which in this case is not correct.

        If I may give you a head up ( it might not be the best suited for you but it might point you in a right direction ).
        I have a few text file in a directory, and am reading each file to see how many times some certain words were used like this:
        use warnings; use strict; use Data::Dumper; my %type; @type{qw(is a an at)} = (0) x 4; for my $file ( glob "*.txt" ) { open my $fh, '<', $file or die $!; while (<$fh>) { for (split) { $type{$_}++ if exists $type{$_}; } } } print Dumper \%type; my $total = 0; $total += $type{$_} for keys %type; print $total, $/;

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me
Re: Reading through a file and checking for a specific string
by jwkrahn (Abbot) on Aug 19, 2013 at 23:03 UTC
    use Data::Dumper; use List::Util qw(sum); # Grab text files from archive directory with glob function @files = glob ('/export/home/date_file*); $arrCount = scalar(@files); my @types = ("AB", "AC", "AD", "AE", "FG"); my @counts = ($AB, $AC, $AD, $AE, $FG); for($i = 0; $i < (@counts) ; $i++ ) { @counts[$i] = 0; } for($i = 0; $i < $arrCount; $i++) { $file = @files[$i]; open(FILE, $file) or die "Can't open `$file': $!"; @lines = <FILE>; close FILE; foreach $line (@lines) { $str = $line; $var = substr($str, 41, 2); for( $i=0; $i<(@types); $i++ ) { if ( $var eq "@types[$i]" ){ @counts[$i]++; } } } } my $sum = 0; for ( @counts) { $sum += $_; } for( $j=0; $j<(@types); $j++ ) { print "@types[$j]\t: @sums[$j] \n"; $j = $j + 1; } print "Total \t: $sum";
    #!/usr/bin/perl use warnings; use strict; use List::Util qw(sum); # Grab text files from archive directory with glob function @ARGV = glob '/export/home/date_file*'; my %counts = ( AB => 0, AC => 0, AD => 0, AE => 0, FG => 0, ); while ( my $line = <> ) { my $var = substr $line, 41, 2; $counts{ $var }++ if exists $counts{ $var }; } my $sum = sum values %counts; for my $type ( keys %counts ) { print "$type\t: $counts{$type}\n"; } print "Total \t: $sum";
      Nice rewrite. Although the connection between
      @ARGV = glob '/export/home/date_file*';
      and
      while ( my $line = <> ) {
      is not exactly obvious and could lead to some serious head-scratching. Especially if the program grows a bit further and actually uses command line arguments. Same as above, with explicit file open / close.
      #!/usr/bin/perl use warnings; use strict; use List::Util qw(sum); # Grab text files from archive directory with glob function my @files = glob '/export/home/date_file*'; my %counts = ( AB => 0, AC => 0, AD => 0, AE => 0, FG => 0, ); for my $file (@files) { open my $fh, '<', $file or die "$file: $!"; while ( my $line = <$fh> ) { my $var = substr $line, 41, 2; $counts{ $var }++ if exists $counts{ $var }; } close $fh or die "$file: $!"; } my $sum = sum values %counts; for my $type ( keys %counts ) { print "$type\t: $counts{$type}\n"; } print "Total \t: $sum\n";
        Thanks! This works flawlessly.

        I just had one more question. How do I avoid printing out values that are not found? So currently it prints out all 5 even if one of the match is not found.
        AB : 135172
        FG : 248782
        AD : 64
        AE : 0
        AC : 0

        I would like to avoid AE and AC in this case since they are not found.

        Also, two more requirements just came up. There is a one word description attached to each string I am supposed to find and they are supposed to be printed alphabetically.

        Would I need to make a separate array to print out description next to each string I am printing? This is what the result should look like:
        AB(ABYUSID) : 135172
        FG(FGIUIO) : 248782
        AD(ADHGUT) : 64
        AE(AERUTOT) : 0
        AC(ACVHGTI) : 0
        Would I need to store these one word descriptions in a separate array and print from there? I tried doing it that way but then it becomes infinite loop and keeps printing these values every time they are found
        Thanks for all the help!
        Do you know if there is a way to print out the two letter string I am looking for if it doesn't exist in the file? So for example, if I have another string in file "FD" that doesn't match anything, could I still print it out just so I know which ones I am missing. Thanks!
      Thanks for your suggestion guys!
Re: Reading through a file and checking for a specific string
by poj (Abbot) on Aug 19, 2013 at 20:34 UTC
    Your counts are in @counts not @sums
    for( $j=0; $j<@types; $j++ ) { print "$types[$j]\t: $counts[$j] \n"; } print "Total\t: $sum";
Re: Reading through a file and checking for a specific string
by BillKSmith (Monsignor) on Aug 20, 2013 at 12:41 UTC
    This is one of the rare cases where I would recommend slurping the entire file rather than reading line-by-line. I feel that the simplification justifies the use of much more memory.
    #!/usr/bin/perl use warnings; use strict; use Slurp; use Data::Dumper; my $strings = join '|', qw( AB AC AD AE FG ); my $counts; for my $file (glob ('File/*')) { $counts->{$_}++ foreach (Slurp($file) =~ m/($strings)/gms); } print Dumper($counts);
    Bill
Re: Reading through a file and checking for a specific string
by protist (Monk) on Aug 20, 2013 at 08:51 UTC

    I am not sure why you are using so much indexing.

    Here is an example that demonstrates finding the two letter combinations at any point in the string.

    I didn't want to have to test all the substring stuff, but this should give you an idea of how to avoid all the needless indexing.

    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my @strings = ("AB", "AC", "AD", "AE", "FG"); my $counts; for my $file (glob ('File/*')) { open my $fh, "<", $file || die "Could not open $!!"; for my $line (<$fh>) { for my $string (@strings) { $counts->{$string}++ for $line =~ m/$string/g; } } close $fh; } print Dumper($counts);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1050075]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-03-19 07:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found