Here's csved, an old script of mine that I use to quickly munge CSV files.
Essentially, it works similar to perl -a or awk. After reading each line, it is parsed into fields and put into @F. Then it executes your expression, and after that, unless -n is given, it prints @F again in CSV format. You can mangle @F any way you like, changing, reordering or deleting entries or completely reassigning it. You can also use next to skip printing a particular line.
I recently thought it would be useful to be able to address columns by their name if the CSV file has those in its first line, so I added the -h option which, via an egregious abuse of tying that I'm very proud of, allows you to also address the fields using the hash %F and field names taken from the first line of the CSV file. Any access through %F actually affects @F directly except via a fieldname-to-index mapping. You can delete entries by assigning undef to them, or delete $F{fieldname} also works: it doesn't delete anything from the tied hash, but instead immediately deletes it from the underlying array @F representing the current line/row.
Also I added -b and -e options which work similarly to BEGIN and END, but I haven't really used them yet. Options to read and write CSV with different separator characters are also available.
This also called for a (still incomplete) test suite; you can look at the results here: https://github.com/gpvos/csved
I hope you will find it useful.
#!/usr/bin/perl -w
#
# csved - apply a Perl expression to all records of a CSV file
#
# Copyright (c) 2003-2022 Gerben Vos. All rights reserved.
# This program is free software; you can redistribute it and/or
# modify it under the same terms as Perl 5.8.0.
#
# Run without arguments for usage information.
#
# Requires the Text::CSV_XS module by Jochen Wiedmann and Alan Citterm
+an,
# available from http://www.cpan.org/ .
use v5.10;
use strict 'vars', 'refs';
# This is a weird usage of tying because the hash is tied only once,
# but the underlying data changes for every processed line.
package Tie::FieldNames;
sub TIEHASH
{
my $self = shift;
my $i = 0;
my $tied = {
HEADERS => { map { ($_, $i++) } @_ },
};
die "Duplicate field names: " .
join(", ", map { "'$_'" } _dups(@_))
unless scalar keys %{$tied->{HEADERS}} == scalar @_;
return bless $tied, $self;
}
sub _dups
{
my %count;
++$count{$_} for @_;
return grep { $count{$_} > 1 } keys %count;
}
sub setcurrent
{
$_[0]->{CURRENT} = $_[1];
}
sub FETCH
{
my ($self, $key) = @_;
die "$key: No such field name" unless exists $self->{HEADERS}->{$k
+ey};
return $self->{CURRENT}->[$self->{HEADERS}->{$key}];
}
sub STORE
{
my ($self, $key, $value) = @_;
die "$key: No such field name" unless exists $self->{HEADERS}->{$k
+ey};
$self->{CURRENT}->[$self->{HEADERS}->{$key}] = $value;
}
sub DELETE
{
my ($self, $key) = @_;
die "$key: No such field name" unless exists $self->{HEADERS}->{$k
+ey};
delete $self->{CURRENT}->[$self->{HEADERS}->{$key}];
}
sub EXISTS
{
my ($self, $key) = @_;
die "$key: No such field name" unless exists $self->{HEADERS}->{$k
+ey};
return exists $self->{CURRENT}->[$self->{HEADERS}->{$key}];
}
sub FIRSTKEY
{
my ($self) = @_;
my $dummy = keys %{$self->{HEADERS}};
each %{$self->{HEADERS}};
}
sub NEXTKEY
{
return each %{$_[0]->{HEADERS}};
}
sub SCALAR
{
return scalar grep { defined } @{$_[0]->{CURRENT}};
}
package main;
use Text::CSV_XS;
use IO::Handle;
my $_silent = 0;
my $_use_headers = 0;
my $_begin = "";
my $_end = "";
my $_sep = ',';
my $_quote = '"';
my $_escape = '"';
my $_progname;
($_progname = $0) =~ s#.*/##;
my $_usage = <<USAGE;
Usage: $_progname [option...] expr [file...]
Apply a Perl expression to all records of a CSV file.
-n suppress printing
-h treat first line as field names and allow access through \%F
-b expr execute expr before processing any input
-e expr execute expr after processing all input
-F char set the character that separates fields (default: comma)
-Q char set the character for quoting fields that contain a separat
+or character
or end-of-line (default: double quote)
-E char set the character for escaping a quote character (default:
+double quote)
The input is processed line by line.
expr should be a Perl expression modifying the \@F array; indexes are
+0-based.
With -h, you can also modify the \%F hash with the same effect. Deleti
+ng an
array or hash entry, or setting it to undef, will cause it not to be o
+utput.
The value of \@F after executing expr is sent to the standard output.
With -h, field names are read from the first line just before executin
+g expr
on it, so if e.g. field 8 is named "comment", you can use "delete \$F{
+comment}"
and that would be equivalent to "delete \$F[8]". Duplicate field names
+ in the
input or a field name in expr that does not exist in the input will ca
+use a
fatal error. A field name for \%F refers to a fixed index in \@F, so i
+f you
rearrange the data in \@F, accesses through \%F may refer to incorrect
+ fields.
Works with binary files (for example, with embedded newlines).
You can use next (to suppress printing), last and redo in expr.
\$. works in expr, so you can e.g. use "if (\$. > 1) { ... }" to not a
+pply some
code to the first line.
Variable names starting with _ are reserved, don't use them in expr.
Uses the Text::CSV_XS module by Jochen Wiedmann and Alan Citterman, wi
+th thanks.
USAGE
while (@ARGV > 0 and $ARGV[0] =~ m{^-}) {
my $opt = shift;
$_silent = 1, next if $opt eq '-n';
$_use_headers = 1, next if $opt eq '-h';
$_begin = shift, next if $opt eq '-b';
$_end = shift, next if $opt eq '-e';
$_sep = shift, next if $opt eq '-F';
$_quote = shift, next if $opt eq '-Q';
$_escape = shift, next if $opt eq '-E';
die $_usage;
}
die $_usage unless @ARGV > 0;
my $_expr = shift;
my $_csv = Text::CSV_XS->new({ sep_char => $_sep, quote_char => $_quot
+e,
escape_char => $_escape, binary => 1,
eol => "\n" });
unshift(@ARGV, '-') unless @ARGV;
{
no strict;
eval $_begin;
die $@ if $@;
}
my %F;
my $_tied;
while ($ARGV = shift) {
open(_IN, $ARGV);
my $_fields;
while ($_fields = $_csv->getline(\*main::_IN) and @$_fields) {
my(@F) = @$_fields;
if ($_use_headers) {
if ($. == 1) {
$_tied = tie %F, 'Tie::FieldNames', @F;
}
$_tied->setcurrent(\@F);
}
# This means you can use next, last, and redo in expr
# without excessive noise. Also turn off strictness.
no warnings "exiting";
no strict;
eval $_expr;
die $@ if $@;
$_csv->print(STDOUT, [ grep { defined } @F ]) unless $_silent;
}
}
if ($_use_headers) {
undef $_tied;
untie %F;
}
{
no strict;
eval $_end;
die $@ if $@;
}