|
johnmck has asked for the
wisdom of the Perl Monks concerning the following question:
I would be greatly appreciative of any suggestions on how to search/replace within specific fields of a CSV. My dataset looks like this:
C,003187995,0H047013N,20141212,0H,2050,EX,,
C,013139784,0H0430000,20141110,0H,2014,EX,,
C,153188023,0H0020000,20150105,0H,2015,BA,,
C,173167740,MT007015G,20141202,0R,2015,BA,,
As an example, I'd like to replace the first field - where there is a 'C' replaced it with CONDENSED or an 'F' replace with FINAL. In the fifth field instead of 0H I'd like to replace that with OHIO, or 0R with OREGON.
Can anyone suggest an approach to start with? Thanks.
John
Re: Search/Replace within fields of CSV
by Tux (Abbot) on May 08, 2015 at 18:55 UTC
|
$ cat test.pl
#!/pro/bin/perl
use 5.20.0;
use warnings;
use Text::CSV_XS "csv";
my %state = qw(
0H Ohio
0R Oregon
C CONDENSED
F Final
);
csv (in => "test.csv", on_in => sub {
$_[1][0] = $state{$_[1][0]} || $_[1][0];
$_[1][4] = $state{$_[1][4]} || $_[1][4];
});
$ cat test.csv
C,003187995,0H047013N,20141212,0H,2050,EX,,
C,013139784,0H0430000,20141110,0H,2014,EX,,
C,153188023,0H0020000,20150105,0H,2015,BA,,
C,173167740,MT007015G,20141202,0R,2015,BA,,
$ perl test.pl
CONDENSED,003187995,0H047013N,20141212,Ohio,2050,EX,,
CONDENSED,013139784,0H0430000,20141110,Ohio,2014,EX,,
CONDENSED,153188023,0H0020000,20150105,Ohio,2015,BA,,
CONDENSED,173167740,MT007015G,20141202,Oregon,2015,BA,,
$
update: As of version 1.17, you can also use filter (if you prefer that):
csv (in => "test.csv", filter => {
1 => sub { $_ = $state{$_} || $_ },
5 => sub { $_ = $state{$_} || $_ },
});
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
Re: Search/Replace within fields of CSV
by aaron_baugher (Curate) on May 08, 2015 at 18:36 UTC
|
The safest way to deal with CSV is with a module, and they're easy to use, so there's no reason not to. Let the module cleanly divide up the fields, make the changes you need, and write the results back out to a new file.
#!/usr/bin/env perl
use 5.010; use strict; use warnings;
use Text::CSV;
my $ic = Text::CSV->new({sep_char => ','})
+
or die Text::CSV->error_diag();
my $oc = Text::CSV->new({sep_char => ',', eol => $/ })
or die Text::CSV->error_diag();
open my $if, '<', 'infile' or die $!;
open my $of, '>', 'outfile' or die $!;
my %states = ( '0H' => 'OHIO',
'0R' => 'OREGON',
);
while( my $r = $ic->getline($if)){
$r->[0] = 'CONDENSED' if $r->[0] eq 'C';
$r->[0] = 'FINAL' if $r->[0] eq 'F';
$r->[4] = $states{$r->[4]} if $states{$r->[4]};
$oc->print($of, $r);
}
close $if;
close $of;
Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.
| [reply] [d/l] |
|
|
Thanks very much, Aaron. I was thinking about Text::CSV but I wasn't sure if there was a more practical way to do it. Your code really helps. Thank you.
| [reply] |
Re: Search/Replace within fields of CSV
by CountZero (Bishop) on May 08, 2015 at 18:41 UTC
|
If you can edit the CSV file and add a first line with field names (I used "First,Second,Third,Fourth,Fifth,Sixth,Seventh,Eight"), then you can use DBI and SQL to do this job for you. use Modern::Perl;
use DBI;
my $dbh = DBI->connect(
"dbi:CSV:",
undef, undef,
{
f_ext => '.csv',
f_dir => 'D:/Perl/scripts',
RaiseError => 1,
}
) or die "Cannot connect: $DBI::errstr";
$dbh->do(q/UPDATE test SET First = 'CONDENSED' WHERE test.First = 'C'/
+);
$dbh->do(q/UPDATE test SET First = 'FINAL' WHERE test.First = 'F'/);
$dbh->do(q/UPDATE test SET Fifth = 'Ohio' WHERE test.Fifth = '0H'/);
$dbh->do(q/UPDATE test SET Fifth = 'Oregon' WHERE test.Fifth = '0R'/);
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] [d/l] [select] |
|
|
Yes, although in my opinion that is “straining the metaphor.” Using a complicated software-stack to do a simple job. It is rather impressive that DBI can do this sort of thing at all, but it will never be my “tool of choice” for such an application. (And, I know that it may or may not be yours.)
If you really need to do things to your data, “such that SQL would be necessary,” then I would put the data into a real SQL file, and strive to keep it there. For instance, SQLite, which already knows how to import and export CSV files very efficiently, runs on everything, and supports full SQL.
Otherwise, “CSV is an encoding for a record-by-record sequential file.” A perfectly-satisfactory encoding for many purposes because it is widely – but, not universally – understood. In any case, and as with all such file-formats, a trusted module such as Text::CSV should be used to do the heavy-lifting.
| [reply] |
|
|
| [reply] |
Re: Search/Replace within fields of CSV
by vinoth.ree (Monsignor) on May 08, 2015 at 18:32 UTC
|
This is a CSV file. In each row there are fields separated with comma. So here is the idea.
The algorithm
The process should go like this:
1.Read in the file line by line.
2.For each line, extract the 1rd and 5th column and check whatever you want and print.
Update:
use strict;
use warnings;
my $file = 'csv_file.csv';
open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n";
while (my $line = <$fh>) {
chomp $line;
my @fields = split(/,/, $line);
if ($fields[0] eq 'C')
{
$fields[0] = 'CONDENSED';
}
elsif ($fields[0] eq 'F'){
$fields[0]='FINAL'
}
if ($fields[4] eq '0H')
{
$fields[4] = '0HIO'
}
elsif ($fields[4] eq '0R')
{
$fields[4] = 'OREGON'
}
print join(',',@fields);
print "\n"
}
Sorry delayed to update the code with module Text::CSV,
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ',' });
my $file = $ARGV[0] or die "Need to get CSV file on the command line\n
+";
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
while (my $line = <$data>) {
chomp $line;
if ($csv->parse($line)) {
my @fields = $csv->fields();
if ($fields[0] eq 'C')
{
$fields[0] = 'CONDENSED';
}
elsif ($fields[0] eq 'F'){
$fields[0]='FINAL'
}
if ($fields[4] eq '0H')
{
$fields[4] = '0HIO'
}
elsif ($fields[4] eq '0R')
{
$fields[4] = 'OREGON'
}
print join(',',@fields);
print "\n"
}
else {
warn "Line could not be parsed: $line\n";
}
}
All is well. I learn by answering your questions...
| [reply] [d/l] [select] |
Re: Search/Replace within fields of CSV
by edimusrex (Monk) on May 08, 2015 at 18:44 UTC
|
You could use a series of regex matches and use a new file to write the output. Something along the lines of
#!/usr/bin/perl
use warnings;
use strict;
my $csv = 'test.csv';
my $complete = 'output.csv';
open FILE, "<$csv";
chomp(my @file = <FILE>);
foreach(@file) {
my @cleaned;
my @array = split(",",$_);
foreach(@array) {
$_ =~ s/^C$/CONDENSED/g;
$_ =~ s/^0H$/OHIO/g;
#etc..for what you want to match
push @cleaned, $_;
}
open OUTPUT, ">>$complete";
print OUTPUT join(",",@cleaned)."\n";
close OUTPUT;
}
If you don't want to use any additional modules | [reply] [d/l] |
|
|