gowthamvels has asked for the wisdom of the Perl Monks concerning the following question:
I need to write a PERL script to group and sort the CSV file with below sample data
The sample data from a input file looks like below.
SAMPLE INPUT INPUTFILE.csv
3211111,100,3.2
3211112,101,3.2
3211111,100,1.2
3211112,100,2.2
3211113,100,5.2
3211112,100,0.3
I need to group first two columns and sum up the third column to obtain below output file -
outputfile.csv
3211111,100,4.4
3211112,100,2.5
3211112,101,3.2
3211113,100,5.2
Please help me out.
2017-07-27 Athanasius added code tags
Re: perl group by and sort from a csv input file
by Corion (Patriarch) on Jul 27, 2017 at 07:34 UTC
|
The easiest approaches are to use either DBD::CSV or to load your CSV data into a database and then to use SQL to run your queries against that. See also DBI.
The harder approach would be to implement the aggregation yourself.
As you don't show any code, it's hard to give you more specific advice, as I can't really tell where you are having problems.
The synopsis section of DBD::CSV shows all there is to using a CSV file as a DBI table.
| [reply] |
Re: perl group by and sort from a csv input file -- oneliner and lazy learning
by Discipulus (Canon) on Jul 27, 2017 at 11:05 UTC
|
Hello gowthamvels and welcome to the monastery and to the wonderful world of Perl!
Next time please show the code you tried: generally monks prefere (with reason) to see some effort from the wisdom seeker.
You already got wise answers and smart ones too. Mine is a oneliner (be aware of windows double quotes: use single quote on Linux).
perl -F"," -lanE "$h{join ',',@F[0,1]}+=$F[2]}{map{say $_.','.$h{$_}}s
+ort keys %h" sample.csv
3211111,100,4.4
3211112,100,2.5
3211112,101,3.2
3211113,100,5.2
See perlrun to know how many useful switches and parameter you can feed to Perl!
Basically speaking -a autosplits incoming strings (at spaces), feeding the special @F array (see perlvar for it)
-F specify an alternative pattern for the autosplit
-l uses a smart line handling
-n wraps your program into a while loop without printing his input ( -p also print it)
-E executes the following code and import some feature (like say I used). Normally you can use -e
}{ is a trick: see eskimo greeting
If the oneliner seems overhelming for you, use -MO=Deparse to have it expanded:
perl -MO=Deparse -F"," -lanE "$h{join ',',@F[0,1]}+=$F[2]}{map{say $_.
+','.$h{$_}}sort keys %h" sample.csv
BEGIN { $/ = "\n"; $\ = "\n"; }
BEGIN {
$^H{'feature_unicode'} = q(1);
$^H{'feature_say'} = q(1);
$^H{'feature_state'} = q(1);
$^H{'feature_switch'} = q(1);
}
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
our(@F) = split(/,/, $_, 0);
$h{join ',', @F[0, 1]} += $F[2];
}
{
map {say $_ . ',' . $h{$_};} sort(keys %h);
}
-e syntax OK
If you are really lazy you can learn what switches do using MO=Deparse adding them progressively and seeing what happens executing a noprogram ( is what perl -e 1 is, marked by '???' in the deparsed output):
perl -MO=Deparse -e 1
'???';
-e syntax OK
perl -MO=Deparse -n -e 1
LINE: while (defined($_ = <ARGV>)) {
'???';
}
-e syntax OK
perl -MO=Deparse -n -a -e 1
LINE: while (defined($_ = <ARGV>)) {
our(@F) = split(' ', $_, 0);
'???';
}
-e syntax OK
perl -MO=Deparse -n -a -F"," -e 1
LINE: while (defined($_ = <ARGV>)) {
our(@F) = split(/,/, $_, 0);
'???';
}
-e syntax OK
perl -MO=Deparse -n -a -F"," -l -e 1
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
our(@F) = split(/,/, $_, 0);
'???';
}
-e syntax OK
perl -MO=Deparse -n -a -F"," -l -E 1
BEGIN { $/ = "\n"; $\ = "\n"; }
BEGIN {
$^H{'feature_unicode'} = q(1);
$^H{'feature_say'} = q(1);
$^H{'feature_state'} = q(1);
$^H{'feature_switch'} = q(1);
}
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
our(@F) = split(/,/, $_, 0);
'???';
}
-e syntax OK
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [d/l] [select] |
Re: perl group by and sort from a csv input file
by tybalt89 (Monsignor) on Jul 27, 2017 at 10:54 UTC
|
#!/usr/bin/perl
# http://perlmonks.org/?node_id=1196143
use strict;
use warnings;
$_ = join '', sort <DATA>;
1 while s/^(\d+,\d+,)\K (\S+)\n\1(\S+)/ $2 + $3 /emx;
print;
__DATA__
3211111,100,3.2
3211112,101,3.2
3211111,100,1.2
3211112,100,2.2
3211113,100,5.2
3211112,100,0.3
| [reply] [d/l] |
Re: perl group by and sort from a csv input file
by Laurent_R (Canon) on Jul 27, 2017 at 08:28 UTC
|
Hi gowthamvels,
one typical way is to use a hash as an accumulator. The hash keys should be the two first values of your CSV input, and the hash values should accumulate the third value of your CSV.
At the end, you can just print out the hash in your desired format.
For example, something like this:
use strict;
use warnings;
use feature 'say';
my %hash;
while (<DATA>) {
chomp;
my ($id, $num, $val) = split /,/, $_;
$hash{"$id,$num"} += $val;
}
for my $key (sort keys %hash) {
say "$key,$hash{$key}";
}
__DATA__
3211111,100,3.2
3211112,101,3.2
3211111,100,1.2
3211112,100,2.2
3211113,100,5.2
3211112,100,0.3
| [reply] [d/l] |
|
Thanks a lot for your great help and response.
Apologies, As I am new to the perl monks, I dodnt know the rules as I didnt paste my code which I tried. from next time I will follow the same.
I have used the below code
use strict;
use warnings;
use feature 'say';
my %hash;
while (<DATA>) {
chomp;
my ($id, $num, $val) = split /,/, $_;
$hash{"$id,$num"} += $val;
}
for my $key (sort keys %hash) {
say "$key,$hash{$key}";
}
Thanks a ton for all of you for this great help.
| [reply] [d/l] |
Re: perl group by and sort from a csv input file
by Tux (Canon) on Jul 27, 2017 at 10:09 UTC
|
Using Laurent_R's hash solution, combined with proper CSV parsing, I'd propose:
use Text::CSV_XS qw( csv );
my %acc;
my $aoa = csv (in => "test.csv", on_in => sub {
$acc{pack "L>L>", $_[1][0], $_[1][1]} += $_[1][2]; });
csv (in => [ map { [(unpack "L>L>", $_), $acc{$_}] } sort keys %acc ])
+;
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
Re: perl group by and sort from a csv input file
by Your Mother (Archbishop) on Jul 28, 2017 at 17:10 UTC
|
As you can see, you have an embarrassment of riches in the
replies. Some of them are purposefully terse and idiomatic because
it's something of a sport here when a SoPW doesn't give code
attempts. :P If you work with one of them and have follow-up questions, don't hesitate to ask, but post whatever code you tried to use.
You got an interesting but terminally slothful, half-right dose of
self-congratulation regarding the use of a database. This is not
necessary of course as several of the replies neatly addressed your
actual question, and need, and are trivial to adapt to many other
requirements if you can follow the code.
That said, some persons find SQL a more natural way of working with
data so it is an interesting and potentially useful thing to
do; there is no try®. Like so many things, it is semi-trivial in Perl if you know how.
Building on previous answers, here's how–
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV_XS "csv";
use DBI;
my $csv_file = shift || die "Give a CSV file with sample data\n";
my $dbh = DBI->connect("dbi:SQLite::memory:"); # DB is ":memory:"
$dbh->do(<<"");
CREATE TABLE sampleData( sample, input, amount )
my $insert_h = $dbh->prepare(<<"");
INSERT INTO sampleData VALUES( ?, ?, ? )
csv( in => "test.csv",
on_in => sub {
my @values = @{ $_[1] };
$insert_h->execute(@values) if @values == 3;
});
my $tallies = $dbh->selectall_arrayref(<<"");
SELECT sample
,input
,SUM(amount)
FROM sampleData
GROUP BY sample, input
ORDER BY sample, input
csv( in => $tallies,
out => "outputfile.csv" );
You will need fairly recent versions of a couple of these for this
to run, Text::CSV_XS, DBD::SQLite.
An excellent overview of DBI recipes: DBI recipes. And a footnote for working with the data outside of Perl–
$dbh->sqlite_backup_to_file("newDBname.sqlite");
# ^^^ To go from ":memory:" to a file. Then you also have access to
# the DB via the command line with the sqlite executable.
# moo@cow[2574]~>sqlite3 "newDBname.sqlite"
# sqlite> select * from sampleData;
# 3211111|100|3.2
# 3211112|101|3.2
# ...et cetera...
Update: s/CVS/CSV/g for @all_the_nodes;#!!! | [reply] [d/l] [select] |
|
$ cat test.csv
3211111,100,3.2
3211112,101,3.2
3211111,100,1.2
3211112,100,2.2
3211113,100,5.2
3211112,100,0.3
$ cat test.pl
use 5.18.2;
use warnings;
use DBI;
use Text::CSV_XS qw(csv);
my $dbh = DBI->connect ("dbi:CSV:");
$dbh->{csv_tables}{sampleData} = {
file => "test.csv",
col_names => [qw( sample input amount )],
};
csv (in => $dbh->selectall_arrayref ("
SELECT sample, input, SUM (amount)
FROM sampleData
GROUP BY sample, input
ORDER BY sample, input"));
$ perl test.pl
3211111,100,4.4
3211112,100,2.5
3211112,101,3.2
3211113,100,5.2
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
|
Oh, I don't think a DB was necessary at all. But if you're going to use that idiom, it's better, I argue, to put it into a DB so you can use the DB tools. As I think I've said to you before, all ++s to you for your CVS CSV (update: DERPy fingers) related work. :P
| [reply] |
|
#!/bin/sh
echo "OPTION 1: DATA FULLY IMPORTED in temp table"
time (
< INPUTFILE.csv psql -c "
drop table if exists t;
create temporary table t(i1 int, i2 int, n3 numeric);
copy t from stdin with (format csv, delimiter ',');
copy (
select i1,i2,sum(n3) from t group by i1,i2 order by i1,i2
) to stdout with(format csv, delimiter ',');
"
)
echo "-- OPTION 2: DATA read via Foreign Data Wrapper (file_fdw)"
echo " drop foreign table if exists inputfile cascade;
drop server pgcsv cascade; " | psql -qX
time (
echo "
create server pgcsv foreign data wrapper file_fdw;
create foreign table inputfile (
i1 int,
i2 int,
n3 numeric
) server pgcsv
options ( filename '/tmp/INPUTFILE.csv', format 'csv' );
copy (
select i1,i2,sum(n3) from inputfile group by i1,i2 order by i1,
+i2 )
to stdout with(format csv, delimiter ',')
" | psql
)
I thought it was interesting to see the different timings:
-- OPTION 1: DATA FULLY IMPORTED in temp table
real 0m0.048s
user 0m0.002s
sys 0m0.006s
-- OPTION 2: DATA read via file_fdw (foreign data wrapper)
real 0m0.038s
user 0m0.002s
sys 0m0.006s
-- Your program
real 0m0.051s
user 0m0.048s
sys 0m0.003s
Foreign Tables, via Foreign Data Wrapper file_fdw, are handy in the gray area between file and database. There is the next step towards full database via a materialized view (of a foreign table) which offers more real db-features (indexing, for one).
| [reply] [d/l] [select] |
Re: perl group by and sort from a csv input file
by Tux (Canon) on Jul 27, 2017 at 07:43 UTC
|
Please use <code> tags, so I could copy your data in case I might be tempted to give it a try.
Please show us the code you tried so far, so we can point you to where it went wrong.
Enjoy, Have FUN! H.Merijn
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. | A reply falls below the community's threshold of quality. You may see it by logging in. |
|
|