Koda1234 has asked for the wisdom of the Perl Monks concerning the following question:
Hello, I'm very new to perl and I'm having a very difficult time organizing my script so that I can get it to do what I want it to do. Right now, I have a data file that is delimited by space, and has 12 columns, and 84000 rows. The only column that I care about is the 9th column. I am trying to organize the information in that column so that I can "count" the number of values given a conditional if statement (i.e. is the value in the list greater than 2.0, 3.0,...and so on.).
My issue is this. While creating a hash, I know I am supposed to assign a "key" to a "value". How do I specify a value that is >= 2 for example, assuming i'm not going to manually calculate the values greater than 2? And how do I get the hash to pull the information from the column, in my file.
Re: Creating a Hash using only one column in an imported data file
by choroba (Cardinal) on Feb 13, 2017 at 17:41 UTC
|
I'm not sure I understand you. Does this do what you want? It uses split to extract the column, and counts how often it's greater than 1, 2, etc. to 10. This is done by a common technique: by incrementing a hash value associated to the number to which we're comparing.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
*ARGV = *DATA{IO} unless @ARGV;
my %greater_than;
while (<>) {
my $col9 = (split)[8];
$col9 > $_ and ++$greater_than{$_} for 1 .. 10;
}
for my $num (sort { $a <=> $b } keys %greater_than) {
say "$greater_than{$num} values in column 9 were greater than $num
+";
}
__DATA__
0 1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11 12
5 5 5 5 5 5 5 5 5 5 5 5
Update: Explanation expanded.
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
Re: Creating a Hash using only one column in an imported data file
by 1nickt (Canon) on Feb 13, 2017 at 18:30 UTC
|
Hi Koda1234, welcome to the monastery and to Perl, the One True Religion.
In Perl to filter a list of values down to a smaller list of only the elements matching a certain condition, use grep.
use strict; use warnings; use feature 'say';
my @col9 = map { (split)[8] } <DATA>;
foreach my $test ( 1, 9, 42, 666 ) {
my $count = scalar grep { $_ >= $test } @col9;
say sprintf "%d values were >= %d", $count, $test;
}
__DATA__
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 42 10 11 12
1 2 3 4 5 6 7 8 42 10 11 12
1 2 3 4 5 6 7 8 42 10 11 12
1 2 3 4 5 6 7 8 1 10 11 12
Output:
$ perl 1181904.pl
7 values were >= 1
6 values were >= 9
3 values were >= 42
0 values were >= 666
See also:
- split for splitting text strings
- map for transforming one list into another
- scalar for counting how many elements are in a list
- sprintf for creating strings containing changing values
- __DATA__ for including a data "file" inside your program code
Hope this helps!
The way forward always starts with a minimal test.
| [reply] [d/l] [select] |
|
Thank you very much!! This was very helpful information. However, my data file is way too large to be able to use inside the code. I tried running it with the format below, and i'm getting 0 values are greater than x for all of the elements. Is there something wrong with the way i'm opening the file?
open (IN, "<$ARGV[0]") || die ("Cannot open $ARGV[0]: $!");
@MyData = <IN>;
use strict; use warnings; use feature 'say';
my @col9 = map {(split)[8]} <IN>;
foreach my $test (2,3,4,5,6,7,8,9) {
my $count =scalar grep {$_ >= $test} @col9;
say sprintf "%d values were >= %d", $count, $test;
}
| [reply] [d/l] |
|
use strict; use warnings; use feature 'say';
my $filename = $ARGV[0] or die "You must supply a filename";
-f $filename or die "You must supply the name of a file that exists!";
open my $IN, '<', $filename or die "Can't open < $filename: $!";
my @col9;
while ( my $line = <$IN> ) {
chomp $line;
push @col9, (split / /, $line)[8];
}
close $IN or die "Can't close $filename: $!";
foreach my $test ( 1, 9, 42, 666 ) {
my $count = scalar grep { $_ >= $test } @col9;
say sprintf "%d values were >= %d", $count, $test;
}
__END__
Hope this helps!
The way forward always starts with a minimal test.
| [reply] [d/l] [select] |
|
|
|
Re: Creating a Hash using only one column in an imported data file
by CountZero (Bishop) on Feb 14, 2017 at 14:54 UTC
|
I would deal with the datafile as a database and use SQL and its aggregate functions to condense that 9th field into a list of unique numbers with the count of them Then it becomes trivially easy to answer your question.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] |
|
|