Re: Reading through a file and checking for a specific string
by roboticus (Chancellor) on Aug 19, 2013 at 19:49 UTC
|
vihar:
I'd suggest putting in a print statement to show the values as you process them. For example, just after extracting your small string, you could print the line number, the bit you extracted and the complete line:
$var = substr($str, 41, 2);
print "$.: $var: $str\n";
Then make a small test file, and run your code against the small test file to see if you get what you're looking for. That ought to help you find the problem in short order.
...roboticus
When your only tool is a hammer, all problems look like your thumb. | [reply] [Watch: Dir/Any] [d/l] |
|
Thanks for your reply. That was the first thing I did to really figure out what's going in but couldn't get anywhere with it. It was printing out a bunch of values and they were not correct.
| [reply] [Watch: Dir/Any] |
Re: Reading through a file and checking for a specific string
by toolic (Bishop) on Aug 19, 2013 at 20:02 UTC
|
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Sorry about that. There were a couple lines I added manually while posting this question. That's why.
| [reply] [Watch: Dir/Any] |
Re: Reading through a file and checking for a specific string
by 2teez (Vicar) on Aug 19, 2013 at 20:16 UTC
|
Scanning through your codes, there are several stuff, that is wrong. Please,
use warnings;
use strict;
to start with, then you will discovery that several of your codes didn't end with a ;. You said you are using a glob function, but their is non in your code, so you only have the string "/export/home/date_file*.txt", in your array variable "@files".
Update:
I have array1 - @types that has these 2 letter strings(which I am supposed to check for in each file in each line, This variable is found in each line at 41st character.) and array2 - @counts where I am trying to store count if that match is found.
Instead of using two arrays, why don't you use a HASH, with the type of strings you would be looking for as the keys and initialized them to zero
like so:
my %type;
@type{qw(AB AC AD AE FG)} = (0) x 5;
##then you have something like this:
$VAR1 = {
'AC' => 0,
'AE' => 0,
'FG' => 0,
'AB' => 0,
'AD' => 0
};
then later you can do...
* while(...){
...
$type{$_}++; ## increasing the counting as you see needed string
...
}
*NOTE:
I don't except the pseudo-code to work, since the OP didn't show any dataset.
If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Hi,
I just updated it in my code. There were a few lines I added manually when I was writing this question. I forgot to put semi colons and glob function but I did that now. Thanks
| [reply] [Watch: Dir/Any] |
|
@files = glob ('/export/home/date_file*);
^ ^<-- not there
Then instead of these $counts[$i] ... $files[$i].. $counts[$i]++.. you are writing @counts[$i] ... @files[$i].. @counts[$i]++.. Which in this case is not correct.
If I may give you a head up ( it might not be the best suited for you but it might point you in a right direction ). I have a few text file in a directory, and am reading each file to see how many times some certain words were used like this:
use warnings;
use strict;
use Data::Dumper;
my %type;
@type{qw(is a an at)} = (0) x 4;
for my $file ( glob "*.txt" ) {
open my $fh, '<', $file or die $!;
while (<$fh>) {
for (split) {
$type{$_}++ if exists $type{$_};
}
}
}
print Dumper \%type;
my $total = 0;
$total += $type{$_} for keys %type;
print $total, $/;
If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Reading through a file and checking for a specific string
by jwkrahn (Abbot) on Aug 19, 2013 at 23:03 UTC
|
use Data::Dumper;
use List::Util qw(sum);
# Grab text files from archive directory with glob function
@files = glob ('/export/home/date_file*);
$arrCount = scalar(@files);
my @types = ("AB", "AC", "AD", "AE", "FG");
my @counts = ($AB, $AC, $AD, $AE, $FG);
for($i = 0; $i < (@counts) ; $i++ ) {
@counts[$i] = 0;
}
for($i = 0; $i < $arrCount; $i++) {
$file = @files[$i];
open(FILE, $file) or die "Can't open `$file': $!";
@lines = <FILE>;
close FILE;
foreach $line (@lines) {
$str = $line;
$var = substr($str, 41, 2);
for( $i=0; $i<(@types); $i++ ) {
if ( $var eq "@types[$i]" ){
@counts[$i]++;
}
}
}
}
my $sum = 0;
for ( @counts) {
$sum += $_;
}
for( $j=0; $j<(@types); $j++ ) {
print "@types[$j]\t: @sums[$j] \n";
$j = $j + 1;
}
print "Total \t: $sum";
#!/usr/bin/perl
use warnings;
use strict;
use List::Util qw(sum);
# Grab text files from archive directory with glob function
@ARGV = glob '/export/home/date_file*';
my %counts = (
AB => 0,
AC => 0,
AD => 0,
AE => 0,
FG => 0,
);
while ( my $line = <> ) {
my $var = substr $line, 41, 2;
$counts{ $var }++ if exists $counts{ $var };
}
my $sum = sum values %counts;
for my $type ( keys %counts ) {
print "$type\t: $counts{$type}\n";
}
print "Total \t: $sum";
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Nice rewrite. Although the connection between
@ARGV = glob '/export/home/date_file*';
and
while ( my $line = <> ) {
is not exactly obvious and could lead to some serious head-scratching. Especially if the program grows a bit further and actually uses command line arguments.
Same as above, with explicit file open / close.
#!/usr/bin/perl
use warnings;
use strict;
use List::Util qw(sum);
# Grab text files from archive directory with glob function
my @files = glob '/export/home/date_file*';
my %counts = (
AB => 0,
AC => 0,
AD => 0,
AE => 0,
FG => 0,
);
for my $file (@files) {
open my $fh, '<', $file
or die "$file: $!";
while ( my $line = <$fh> ) {
my $var = substr $line, 41, 2;
$counts{ $var }++ if exists $counts{ $var };
}
close $fh
or die "$file: $!";
}
my $sum = sum values %counts;
for my $type ( keys %counts ) {
print "$type\t: $counts{$type}\n";
}
print "Total \t: $sum\n";
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Thanks! This works flawlessly.
I just had one more question. How do I avoid printing out values that are not found? So currently it prints out all 5 even if one of the match is not found.
AB : 135172
FG : 248782
AD : 64
AE : 0
AC : 0
I would like to avoid AE and AC in this case since they are not found.
Also, two more requirements just came up. There is a one word description attached to each string I am supposed to find and they are supposed to be printed alphabetically.
Would I need to make a separate array to print out description next to each string I am printing?
This is what the result should look like:
AB(ABYUSID) : 135172
FG(FGIUIO) : 248782
AD(ADHGUT) : 64
AE(AERUTOT) : 0
AC(ACVHGTI) : 0
Would I need to store these one word descriptions in a separate array and print from there? I tried doing it that way but then it becomes infinite loop and keeps printing these values every time they are found
Thanks for all the help!
| [reply] [Watch: Dir/Any] |
|
|
|
|
|
Do you know if there is a way to print out the two letter string I am looking for if it doesn't exist in the file?
So for example, if I have another string in file "FD" that doesn't match anything, could I still print it out just so I know which ones I am missing. Thanks!
| [reply] [Watch: Dir/Any] |
|
|
Thanks for your suggestion guys!
| [reply] [Watch: Dir/Any] |
Re: Reading through a file and checking for a specific string
by poj (Abbot) on Aug 19, 2013 at 20:34 UTC
|
Your counts are in @counts not @sums
for( $j=0; $j<@types; $j++ ) {
print "$types[$j]\t: $counts[$j] \n";
}
print "Total\t: $sum";
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Reading through a file and checking for a specific string
by BillKSmith (Monsignor) on Aug 20, 2013 at 12:41 UTC
|
This is one of the rare cases where I would recommend slurping the entire file rather than reading line-by-line. I feel that the simplification justifies the use of much more memory.
#!/usr/bin/perl
use warnings;
use strict;
use Slurp;
use Data::Dumper;
my $strings = join '|', qw( AB AC AD AE FG );
my $counts;
for my $file (glob ('File/*')) {
$counts->{$_}++ foreach (Slurp($file) =~ m/($strings)/gms);
}
print Dumper($counts);
| [reply] [Watch: Dir/Any] [d/l] |
Re: Reading through a file and checking for a specific string
by protist (Monk) on Aug 20, 2013 at 08:51 UTC
|
I am not sure why you are using so much indexing.
Here is an example that demonstrates finding the two letter combinations at any point in the string.
I didn't want to have to test all the substring stuff, but this should give you an idea of how to avoid all the needless indexing.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my @strings = ("AB", "AC", "AD", "AE", "FG");
my $counts;
for my $file (glob ('File/*')) {
open my $fh, "<", $file || die "Could not open $!!";
for my $line (<$fh>) {
for my $string (@strings) {
$counts->{$string}++ for $line =~ m/$string/g;
}
}
close $fh;
}
print Dumper($counts);
| [reply] [Watch: Dir/Any] [d/l] |