http://www.perlmonks.org?node_id=1098462

john.tm has asked for the wisdom of the Perl Monks concerning the following question:

I have to run a report from a csv file, i am stuck on the part where i want to keep only unique elements from column 1,if column 5 matches a certain string, displaying the count of how many duplicates there were.

I can get the count of all the names in the list with how may times thay appear but am lost on how to do a (while column1 and if column 5 = "????") bit

input col1 col5 output james yes james 2 dave yes dave 2 mike yes mike 3 ken no ken 1 jon yes jon 4 jon no ken yes jon yes mike no james yes dave yes mike no ken no jon yes jon yes
this is a snippet of what i have so far
my %seen; while (<INPUT>) { chomp; my $line = $_; my @elements = split (",", $line); my $col_name = $elements[1]; my $col_type = $elements[5]; $seen{$col_name}++; } while ( my ( $col_name, $times_seen ) = each %seen ) { my $loc_total = $times_seen ; print " $col_name\t\t: $loc_total\n"; }

Replies are listed 'Best First'.
Re: keep only unique elements in an array displaying number of duplicates
by Athanasius (Archbishop) on Aug 24, 2014 at 16:20 UTC

    Hello john.tm,

    If I understand you correctly, you just need to change this line:

    $seen{$col_name}++;

    to this (replacing the ... with the appropriate “certain string”):

    $seen{$col_name}++ if $col_type eq ...;

    Then replace the second while loop with:

    print " $_\t\t: $seen{$_}\n" for sort keys %seen;

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: keep only unique elements in an array displaying number of duplicates.
by CountZero (Bishop) on Aug 24, 2014 at 18:17 UTC
    Using DBI and DBD::CSV:
    use Modern::Perl qw/2014/; use DBI; my $dbh = DBI->connect("dbi:CSV:"); my $query = "SELECT col1, count(col1) AS counter FROM test.csv WHERE col5 = 'yes' + GROUP by col1 ORDER BY col1"; my $sth = $dbh->prepare($query); $sth->execute(); while ( my $row = $sth->fetchrow_hashref ) { say $row->{col1}, "\t", $row->{'counter'}; }
    Output:
    dave 2 james 2 jon 4 ken 1 mike 1

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: keep only unique elements in an array displaying number of duplicates.
by Laurent_R (Canon) on Aug 24, 2014 at 20:23 UTC
    Or a simple Perl one-liner:
    $ echo ' james yes dave yes mike yes ken no jon yes jon no ken yes jon yes mike no james yes dave yes mike no ken no jon yes jon yes ' | perl -nE 'my ($name, $val) = split; $hash{$name} ++ if $val eq "ye +s"; END{ say "$_\t$hash{$_}" for keys %hash}' james 2 mike 1 dave 2 jon 4 ken 1
    I realize that you spoke about col5, but I had to use the data as you supplied it. If the flag is really in col5 (i.e. presumably field subscript 4), change the one-liner to:
    perl -nE 'my ($name, $val) = (split)[0,4]; $hash{$name} ++ if $val eq +"yes"; END{ say "$_\t$hash{$_}" for keys %hash}'