Count the Duplicate Entries and make them uniq

slayedbylucifer has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am working on CSV and need to manipulate it as follows. Please note that these are test files I created for explanation on this thread. However the essence of the problem is the same.

I have below test.csv:

OS,RELEASE,LIKE
Ubuntu,Warty,No
Ubuntu,Hoary,No
Ubuntu,Breezy,Yes
Ubuntu,Breazy,Yes
Fedora,Yarrow,Yes
Fedora,Stentz,No
Fedora,Yarrow,Yes
Fedora,Yarrow,Yes
Windows,XP PRO,Yes
Windows,XP PRO,Yes
Windows,XP Home,No
Windows,XP PRO,Yes
[download]

Please note that there are rows which has duplicate entries.

I want to create report.csv as follows by referring above table:

OS,RELEASE,Yes,No
Ubuntu,Warty,0,1
Ubuntu,Hoary,0,1
Ubuntu,Breezy,2,0
Fedora,Yarrow,3,0
Fedora,Stentz,0,1
Windows,XP PRO,3,0
Windows,XP Home,0,1
[download]

Meaning, e.g. I need to create entries for OS and RELEASE and count the times It was liked as "YES" or "NO".

So this is the flow I am following:

1. Create hash of array with "OS name" as hash value and "Array reference" to the array containing RELASE versions for that OS as the values.

2. Run a "SELECT" query by looping over this Hash and print the corresponding the RELEASE and LIKE values

3. unique/count the output received in step 2 and push it to the report.csv

I am stuck at step no. 3. I am not able to unique and count the entries which were received from step 2. I would not say that my code is not working. i would rather say I am not able to think of a way to make it work. Basically, I am not able to think of the logic how to get this done.

Sorry for the long and confusing explanation. I am pasting my code below but I am pretty sure it would add to more confusion. Also you may find lots of bugs in this code

#!/usr/bin/perl -w

use strict;
use Data::Dumper;
use DBI;

my @os;
my %duplicate;
my %hash1;

my $dbh = DBI->connect ("dbi:CSV:") or die "Cannot connect to the CSV 
+file: $DBI::errstr()";
$dbh->{RaiseError} = 1;
$dbh->{TraceLevel} = 0;

my $table = "test.csv";

my $query = "select OS from $table";
my $sth = $dbh->prepare ($query);
$sth->execute();

while ( my $row_ref = $sth->fetch )
{
    (my $os = $row_ref->[0]);
    push (@os, $os) unless $duplicate{$os}++ ;
}

foreach my $operating_system (@os) 
{
    my @array;
    my $query = "select RELEASE from $table where OS = ?";
    my $sth = $dbh->prepare ($query);
    $sth->execute($operating_system);
    while (my $ref = $sth->fetch )
    {
        my $var = $ref->[0];
        push (@array, $var) unless $duplicate{$var}++ ;  ### removing 
+the duplicate release names
    }
    $hash1{$operating_system} = \@array;   
}

my @new;

foreach my $operating_system (@os) 
{ 
     foreach my $arr_ref ($hash1{$operating_system})
     {
         my $arr_index = @{$hash1{$operating_system}}-1; 
         for (0..$arr_index)
         {
            my $release = $hash1{$operating_system}->[$_];
            my $query = "select LIKE from $table where OS = ? and RELE
+ASE = ?";
            my $sth = $dbh->prepare ($query);
            $sth->execute ($operating_system, $release);
            while ( my $row_ref = $sth->fetch )
            {
               push (@new, $release.",".$hash{$row_ref->[0]}); 
            }
            $sth->finish ();
         }
     }
}

#############  >>>> from here below I need to figure out the unique en
+tries and count the times it were repeated and push it to report.csv

foreach my $var (@new)
{
  my $count = 0;
  if ( ! $duplicate{$var}++ )
  {
      print $var.$count."\n";
  }
}
$sth->finish();
$dbh->disconnect()
[download]

Thanks.

Back to Seekers of Perl Wisdom