Re^4: edit a CSV and "in-place" replacement

in reply to Re^3: edit a CSV and "in-place" replacement
in thread edit a CSV and "in-place" replacement

Just when you start to think that, you discover one of your contacts is "Dinosaur, Jr." or you forget about all those lawyers who like to put "I. M. Alawyer, Esq." in their contact info, and you get a bunch of munged up data. Excel will correctly quote cells full of commas when you export to CSV. I haven't tried it with Google.

Comment on Re^4: edit a CSV and "in-place" replacement

Replies are listed 'Best First'.
Re^5: edit a CSV and "in-place" replacement by Tux (Canon) on Jun 22, 2012 at 09:31 UTC
The original question was "in-place-editing" required. Using the CSV module(s) Text::CSV_XS (used in DBD::CSV) and/or Text::CSV (which uses Text::CSV_XS if available for speed) you cannot in-place edit, but you can set up two instances: one for input, and one for output. This has to be used instead of DBD::CSV if there are no headers or no unique keys, somewhat along these lines: `my $ci = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); my $co = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\n" + }); open my $fi, "<:encoding(utf-8)", "google.csv"; open my $fo, ">:encoding(utf-8)", "contacts.csv"; while (my $row = $ci->getline ($ci)) { $row->[34] =~ s/^0/+91/; $co->print ($fo, $row); } close $fi; close $fo;` [download] Which is safe for embedded comma's as you describe (when correctly quoted). On huge files, you can speed this even using `bind_columns ()`. Another (huge) advantage is that this is MUCH faster than using DBD::CSV, has no size limits (other than disk storage) and can be used in streams, both in and out. DBD::CSV is likely to keep the complete file in memory, probably even twice. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^5: edit a CSV and "in-place" replacement
by Tux (Canon) on Jun 22, 2012 at 09:31 UTC

The original question was "in-place-editing" required. Using the CSV module(s) Text::CSV_XS (used in DBD::CSV) and/or Text::CSV (which uses Text::CSV_XS if available for speed) you cannot in-place edit, but you can set up two instances: one for input, and one for output. This has to be used instead of DBD::CSV if there are no headers or no unique keys, somewhat along these lines:

my $ci = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
my $co = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\n"
+ });
open my $fi, "<:encoding(utf-8)", "google.csv";
open my $fo, ">:encoding(utf-8)", "contacts.csv";
while (my $row = $ci->getline ($ci)) {
    $row->[34] =~ s/^0/+91/;
    $co->print ($fo, $row);
    }
close $fi;
close $fo;
[download]

Which is safe for embedded comma's as you describe (when correctly quoted). On huge files, you can speed this even using bind_columns ().

Another (huge) advantage is that this is MUCH faster than using DBD::CSV, has no size limits (other than disk storage) and can be used in streams, both in and out. DBD::CSV is likely to keep the complete file in memory, probably even twice.

Enjoy, Have FUN! H.Merijn

[reply]
[d/l]
[select]

In Section Seekers of Perl Wisdom