Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Help Sorting a CSV File

by erix (Parson)
on Oct 04, 2019 at 20:28 UTC ( #11107067=note: print w/replies, xml ) Need Help??


in reply to Help Sorting a CSV File

Admittedly just for my own fun I did it via a database (postgres 12.0). Don't do this at home --- although it isn't actually too bad, and only /just/ a bit more than a one-liner... I added the missing column names as I assume they are normally there in the real data.

#!/bin/bash echo "CAT_HEADER,SUPPLIER_CODE,CUSTOMER_CODE,F4,F5,F6,F7,F8 CAT_LINE,0001P,ABC12345,20190924,,1,Z,3.36 CAT_LINE,0002P,ABC12345,20190924,,1,Z,3.36 CAT_LINE,0001P,ABC23456,20190924,,1,Z,2.24 CAT_LINE,0002P,ABC23456,20190924,,1,Z,2.24 CAT_LINE,0001P,ABC34567,20190924,,1,Z,2.24 CAT_LINE,0002P,ABC34567,20190924,,1,Z,2.24" > data.txt head -n 1 data.txt | perl -ne 'chomp; print " drop table if exists pm11107044 ; create table pm11107044 (" . join(",", map {"\"$_\" text"} split(/,/, +$_)) . ");"; ' | psql -qX && < data.txt psql -qXc "copy pm11107044 from stdin with +(format csv, header true);" echo "-- unordered data.txt:" cat data.txt echo echo "-- ordered data:" echo "select * from pm11107044 order by 2, 3" | psql -qX --csv # | m +d5sum echo # older psql doesn't have --csv (introduced in postgres 13devel); i +n that case use: # echo "copy(select * from pm11107044 order by 2, 3) to stdout with (f +ormat csv, delimiter ',', header true)" | psql -qX # | md5sum

Output:

./pm.pl -- unordered data.txt: CAT_HEADER,SUPPLIER_CODE,CUSTOMER_CODE,F4,F5,F6,F7,F8 CAT_LINE,0001P,ABC12345,20190924,,1,Z,3.36 CAT_LINE,0002P,ABC12345,20190924,,1,Z,3.36 CAT_LINE,0001P,ABC23456,20190924,,1,Z,2.24 CAT_LINE,0002P,ABC23456,20190924,,1,Z,2.24 CAT_LINE,0001P,ABC34567,20190924,,1,Z,2.24 CAT_LINE,0002P,ABC34567,20190924,,1,Z,2.24 -- ordered data: CAT_HEADER,SUPPLIER_CODE,CUSTOMER_CODE,F4,F5,F6,F7,F8 CAT_LINE,0001P,ABC12345,20190924,,1,Z,3.36 CAT_LINE,0001P,ABC23456,20190924,,1,Z,2.24 CAT_LINE,0001P,ABC34567,20190924,,1,Z,2.24 CAT_LINE,0002P,ABC12345,20190924,,1,Z,3.36 CAT_LINE,0002P,ABC23456,20190924,,1,Z,2.24 CAT_LINE,0002P,ABC34567,20190924,,1,Z,2.24

( Another (actually more appropriate) database way would be to read the data file directly as text via file_fdw. Maybe I'll have a go at that tomorrow. )

Edit: I just realised that --csv is in postgresql 13 (i.e., the development branch. But I'll leave the above as the alternative for earlier versions is mentioned ( COPY (select ...) ... )

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11107067]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2019-12-12 10:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?