http://www.perlmonks.org?node_id=990445

slayedbylucifer has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am working on CSV and need to manipulate it as follows. Please note that these are test files I created for explanation on this thread. However the essence of the problem is the same.

I have below test.csv:

OS,RELEASE,LIKE Ubuntu,Warty,No Ubuntu,Hoary,No Ubuntu,Breezy,Yes Ubuntu,Breazy,Yes Fedora,Yarrow,Yes Fedora,Stentz,No Fedora,Yarrow,Yes Fedora,Yarrow,Yes Windows,XP PRO,Yes Windows,XP PRO,Yes Windows,XP Home,No Windows,XP PRO,Yes

Please note that there are rows which has duplicate entries.

I want to create report.csv as follows by referring above table:

OS,RELEASE,Yes,No Ubuntu,Warty,0,1 Ubuntu,Hoary,0,1 Ubuntu,Breezy,2,0 Fedora,Yarrow,3,0 Fedora,Stentz,0,1 Windows,XP PRO,3,0 Windows,XP Home,0,1

Meaning, e.g. I need to create entries for OS and RELEASE and count the times It was liked as "YES" or "NO".

So this is the flow I am following:

1. Create hash of array with "OS name" as hash value and "Array reference" to the array containing RELASE versions for that OS as the values.

2. Run a "SELECT" query by looping over this Hash and print the corresponding the RELEASE and LIKE values

3. unique/count the output received in step 2 and push it to the report.csv

I am stuck at step no. 3. I am not able to unique and count the entries which were received from step 2. I would not say that my code is not working. i would rather say I am not able to think of a way to make it work. Basically, I am not able to think of the logic how to get this done.

Sorry for the long and confusing explanation. I am pasting my code below but I am pretty sure it would add to more confusion. Also you may find lots of bugs in this code

#!/usr/bin/perl -w use strict; use Data::Dumper; use DBI; my @os; my %duplicate; my %hash1; my $dbh = DBI->connect ("dbi:CSV:") or die "Cannot connect to the CSV +file: $DBI::errstr()"; $dbh->{RaiseError} = 1; $dbh->{TraceLevel} = 0; my $table = "test.csv"; my $query = "select OS from $table"; my $sth = $dbh->prepare ($query); $sth->execute(); while ( my $row_ref = $sth->fetch ) { (my $os = $row_ref->[0]); push (@os, $os) unless $duplicate{$os}++ ; } foreach my $operating_system (@os) { my @array; my $query = "select RELEASE from $table where OS = ?"; my $sth = $dbh->prepare ($query); $sth->execute($operating_system); while (my $ref = $sth->fetch ) { my $var = $ref->[0]; push (@array, $var) unless $duplicate{$var}++ ; ### removing +the duplicate release names } $hash1{$operating_system} = \@array; } my @new; foreach my $operating_system (@os) { foreach my $arr_ref ($hash1{$operating_system}) { my $arr_index = @{$hash1{$operating_system}}-1; for (0..$arr_index) { my $release = $hash1{$operating_system}->[$_]; my $query = "select LIKE from $table where OS = ? and RELE +ASE = ?"; my $sth = $dbh->prepare ($query); $sth->execute ($operating_system, $release); while ( my $row_ref = $sth->fetch ) { push (@new, $release.",".$hash{$row_ref->[0]}); } $sth->finish (); } } } ############# >>>> from here below I need to figure out the unique en +tries and count the times it were repeated and push it to report.csv foreach my $var (@new) { my $count = 0; if ( ! $duplicate{$var}++ ) { print $var.$count."\n"; } } $sth->finish(); $dbh->disconnect()

Thanks.