Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

perl group by and sort from a csv input file

by gowthamvels (Novice)
on Jul 27, 2017 at 07:27 UTC ( #1196143=perlquestion: print w/replies, xml ) Need Help??

gowthamvels has asked for the wisdom of the Perl Monks concerning the following question:

I need to write a PERL script to group and sort the CSV file with below sample data
The sample data from a input file looks like below.

SAMPLE INPUT INPUTFILE.csv 3211111,100,3.2 3211112,101,3.2 3211111,100,1.2 3211112,100,2.2 3211113,100,5.2 3211112,100,0.3

I need to group first two columns and sum up the third column to obtain below output file -

outputfile.csv 3211111,100,4.4 3211112,100,2.5 3211112,101,3.2 3211113,100,5.2

Please help me out.

2017-07-27 Athanasius added code tags

Replies are listed 'Best First'.
Re: perl group by and sort from a csv input file
by Corion (Pope) on Jul 27, 2017 at 07:34 UTC

    The easiest approaches are to use either DBD::CSV or to load your CSV data into a database and then to use SQL to run your queries against that. See also DBI.

    The harder approach would be to implement the aggregation yourself.

    As you don't show any code, it's hard to give you more specific advice, as I can't really tell where you are having problems.

    The synopsis section of DBD::CSV shows all there is to using a CSV file as a DBI table.

Re: perl group by and sort from a csv input file -- oneliner and lazy learning
by Discipulus (Abbot) on Jul 27, 2017 at 11:05 UTC
    Hello gowthamvels and welcome to the monastery and to the wonderful world of Perl!

    Next time please show the code you tried: generally monks prefere (with reason) to see some effort from the wisdom seeker.

    You already got wise answers and smart ones too. Mine is a oneliner (be aware of windows double quotes: use single quote on Linux).

    perl -F"," -lanE "$h{join ',',@F[0,1]}+=$F[2]}{map{say $_.','.$h{$_}}s +ort keys %h" sample.csv 3211111,100,4.4 3211112,100,2.5 3211112,101,3.2 3211113,100,5.2

    See perlrun to know how many useful switches and parameter you can feed to Perl!

    Basically speaking -a autosplits incoming strings (at spaces), feeding the special @F array (see perlvar for it)

    -F specify an alternative pattern for the autosplit

    -l uses a smart line handling

    -n wraps your program into a while loop without printing his input ( -p also print it)

    -E executes the following code and import some feature (like say I used). Normally you can use -e

    }{ is a trick: see eskimo greeting

    If the oneliner seems overhelming for you, use -MO=Deparse to have it expanded:

    perl -MO=Deparse -F"," -lanE "$h{join ',',@F[0,1]}+=$F[2]}{map{say $_. +','.$h{$_}}sort keys %h" sample.csv BEGIN { $/ = "\n"; $\ = "\n"; } BEGIN { $^H{'feature_unicode'} = q(1); $^H{'feature_say'} = q(1); $^H{'feature_state'} = q(1); $^H{'feature_switch'} = q(1); } LINE: while (defined($_ = <ARGV>)) { chomp $_; our(@F) = split(/,/, $_, 0); $h{join ',', @F[0, 1]} += $F[2]; } { map {say $_ . ',' . $h{$_};} sort(keys %h); } -e syntax OK

    If you are really lazy you can learn what switches do using MO=Deparse adding them progressively and seeing what happens executing a noprogram ( is what perl -e 1 is, marked by '???' in the deparsed output):

    perl -MO=Deparse -e 1 '???'; -e syntax OK perl -MO=Deparse -n -e 1 LINE: while (defined($_ = <ARGV>)) { '???'; } -e syntax OK perl -MO=Deparse -n -a -e 1 LINE: while (defined($_ = <ARGV>)) { our(@F) = split(' ', $_, 0); '???'; } -e syntax OK perl -MO=Deparse -n -a -F"," -e 1 LINE: while (defined($_ = <ARGV>)) { our(@F) = split(/,/, $_, 0); '???'; } -e syntax OK perl -MO=Deparse -n -a -F"," -l -e 1 BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = <ARGV>)) { chomp $_; our(@F) = split(/,/, $_, 0); '???'; } -e syntax OK perl -MO=Deparse -n -a -F"," -l -E 1 BEGIN { $/ = "\n"; $\ = "\n"; } BEGIN { $^H{'feature_unicode'} = q(1); $^H{'feature_say'} = q(1); $^H{'feature_state'} = q(1); $^H{'feature_switch'} = q(1); } LINE: while (defined($_ = <ARGV>)) { chomp $_; our(@F) = split(/,/, $_, 0); '???'; } -e syntax OK

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: perl group by and sort from a csv input file
by tybalt89 (Parson) on Jul 27, 2017 at 10:54 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1196143 use strict; use warnings; $_ = join '', sort <DATA>; 1 while s/^(\d+,\d+,)\K (\S+)\n\1(\S+)/ $2 + $3 /emx; print; __DATA__ 3211111,100,3.2 3211112,101,3.2 3211111,100,1.2 3211112,100,2.2 3211113,100,5.2 3211112,100,0.3
Re: perl group by and sort from a csv input file
by Laurent_R (Canon) on Jul 27, 2017 at 08:28 UTC
    Hi gowthamvels,

    one typical way is to use a hash as an accumulator. The hash keys should be the two first values of your CSV input, and the hash values should accumulate the third value of your CSV.

    At the end, you can just print out the hash in your desired format.

    For example, something like this:

    use strict; use warnings; use feature 'say'; my %hash; while (<DATA>) { chomp; my ($id, $num, $val) = split /,/, $_; $hash{"$id,$num"} += $val; } for my $key (sort keys %hash) { say "$key,$hash{$key}"; } __DATA__ 3211111,100,3.2 3211112,101,3.2 3211111,100,1.2 3211112,100,2.2 3211113,100,5.2 3211112,100,0.3

      Thanks a lot for your great help and response.

      Apologies, As I am new to the perl monks, I dodnt know the rules as I didnt paste my code which I tried. from next time I will follow the same.

      I have used the below code

      use strict; use warnings; use feature 'say'; my %hash; while (<DATA>) { chomp; my ($id, $num, $val) = split /,/, $_; $hash{"$id,$num"} += $val; } for my $key (sort keys %hash) { say "$key,$hash{$key}"; }

      Thanks a ton for all of you for this great help.

Re: perl group by and sort from a csv input file
by Tux (Abbot) on Jul 27, 2017 at 10:09 UTC

    Using Laurent_R's hash solution, combined with proper CSV parsing, I'd propose:

    use Text::CSV_XS qw( csv ); my %acc; my $aoa = csv (in => "test.csv", on_in => sub { $acc{pack "L>L>", $_[1][0], $_[1][1]} += $_[1][2]; }); csv (in => [ map { [(unpack "L>L>", $_), $acc{$_}] } sort keys %acc ]) +;

    Enjoy, Have FUN! H.Merijn
Re: perl group by and sort from a csv input file
by Your Mother (Bishop) on Jul 28, 2017 at 17:10 UTC

    As you can see, you have an embarrassment of riches in the replies. Some of them are purposefully terse and idiomatic because it's something of a sport here when a SoPW doesn't give code attempts. :P If you work with one of them and have follow-up questions, don't hesitate to ask, but post whatever code you tried to use.

    You got an interesting but terminally slothful, half-right dose of self-congratulation regarding the use of a database. This is not necessary of course as several of the replies neatly addressed your actual question, and need, and are trivial to adapt to many other requirements if you can follow the code.

    That said, some persons find SQL a more natural way of working with data so it is an interesting and potentially useful thing to do; there is no try®. Like so many things, it is semi-trivial in Perl if you know how. Building on previous answers, here's how–

    #!/usr/bin/env perl use strict; use warnings; use Text::CSV_XS "csv"; use DBI; my $csv_file = shift || die "Give a CSV file with sample data\n"; my $dbh = DBI->connect("dbi:SQLite::memory:"); # DB is ":memory:" $dbh->do(<<""); CREATE TABLE sampleData( sample, input, amount ) my $insert_h = $dbh->prepare(<<""); INSERT INTO sampleData VALUES( ?, ?, ? ) csv( in => "test.csv", on_in => sub { my @values = @{ $_[1] }; $insert_h->execute(@values) if @values == 3; }); my $tallies = $dbh->selectall_arrayref(<<""); SELECT sample ,input ,SUM(amount) FROM sampleData GROUP BY sample, input ORDER BY sample, input csv( in => $tallies, out => "outputfile.csv" );

    You will need fairly recent versions of a couple of these for this to run, Text::CSV_XS, DBD::SQLite.

    An excellent overview of DBI recipes: DBI recipes. And a footnote for working with the data outside of Perl–

    $dbh->sqlite_backup_to_file("newDBname.sqlite"); # ^^^ To go from ":memory:" to a file. Then you also have access to # the DB via the command line with the sqlite executable. # moo@cow[2574]~>sqlite3 "newDBname.sqlite" # sqlite> select * from sampleData; # 3211111|100|3.2 # 3211112|101|3.2 # ...et cetera...

    Update: s/CVS/CSV/g for @all_the_nodes;#!!!

      Then why not use DBD::CSV directly?

      $ cat test.csv 3211111,100,3.2 3211112,101,3.2 3211111,100,1.2 3211112,100,2.2 3211113,100,5.2 3211112,100,0.3 $ cat test.pl use 5.18.2; use warnings; use DBI; use Text::CSV_XS qw(csv); my $dbh = DBI->connect ("dbi:CSV:"); $dbh->{csv_tables}{sampleData} = { file => "test.csv", col_names => [qw( sample input amount )], }; csv (in => $dbh->selectall_arrayref (" SELECT sample, input, SUM (amount) FROM sampleData GROUP BY sample, input ORDER BY sample, input")); $ perl test.pl 3211111,100,4.4 3211112,100,2.5 3211112,101,3.2 3211113,100,5.2

      Enjoy, Have FUN! H.Merijn

        Oh, I don't think a DB was necessary at all. But if you're going to use that idiom, it's better, I argue, to put it into a DB so you can use the DB tools. As I think I've said to you before, all ++s to you for your CVS CSV (update: DERPy fingers) related work. :P

      It would be absurd to install postgres for this functionality but if it's available it offers at least two attractive alternatives. I'll just put my test file here:

      #!/bin/sh echo "OPTION 1: DATA FULLY IMPORTED in temp table" time ( < INPUTFILE.csv psql -c " drop table if exists t; create temporary table t(i1 int, i2 int, n3 numeric); copy t from stdin with (format csv, delimiter ','); copy ( select i1,i2,sum(n3) from t group by i1,i2 order by i1,i2 ) to stdout with(format csv, delimiter ','); " ) echo "-- OPTION 2: DATA read via Foreign Data Wrapper (file_fdw)" echo " drop foreign table if exists inputfile cascade; drop server pgcsv cascade; " | psql -qX time ( echo " create server pgcsv foreign data wrapper file_fdw; create foreign table inputfile ( i1 int, i2 int, n3 numeric ) server pgcsv options ( filename '/tmp/INPUTFILE.csv', format 'csv' ); copy ( select i1,i2,sum(n3) from inputfile group by i1,i2 order by i1, +i2 ) to stdout with(format csv, delimiter ',') " | psql )

      I thought it was interesting to see the different timings:

      -- OPTION 1: DATA FULLY IMPORTED in temp table real 0m0.048s user 0m0.002s sys 0m0.006s -- OPTION 2: DATA read via file_fdw (foreign data wrapper) real 0m0.038s user 0m0.002s sys 0m0.006s -- Your program real 0m0.051s user 0m0.048s sys 0m0.003s

      Foreign Tables, via Foreign Data Wrapper file_fdw, are handy in the gray area between file and database. There is the next step towards full database via a materialized view (of a foreign table) which offers more real db-features (indexing, for one).

Re: perl group by and sort from a csv input file
by Tux (Abbot) on Jul 27, 2017 at 07:43 UTC

    Please use <code> tags, so I could copy your data in case I might be tempted to give it a try.

    Please show us the code you tried so far, so we can point you to where it went wrong.


    Enjoy, Have FUN! H.Merijn
    A reply falls below the community's threshold of quality. You may see it by logging in.
    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1196143]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2019-08-21 11:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?