Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re^2: edit a CSV and "in-place" replacement

by bitingduck (Chaplain)
on Jun 22, 2012 at 07:03 UTC ( #977789=note: print w/replies, xml ) Need Help??

in reply to Re: edit a CSV and "in-place" replacement
in thread edit a CSV and "in-place" replacement

If you can be sure the regex won't match any other field, you can try:

perl -p -i -e 's/\b0(\d{10})\b/\+91$1/g;' filename

If there are other fields that it could match and munge, then it could all go horribly wrong and you're better off with something that accurately replaces only that field, like Tux already posted.

Update: I just took a look over at your thread at SO. You're trying a dangerous thing by counting commas in a CSV. Any element of a CSV can have commas inside of quotes to protect them, so you could have any number of commas before the 35th element in a row. CSV is a deceptive format-- it seems all simple and benign, but can be complicated in really annoying ways. If the quick and dirty regex won't work, use a module like Tux shows that is designed to deal with the unexpected but legal cases in the CSV.

Replies are listed 'Best First'.
Re^3: edit a CSV and "in-place" replacement
by slayedbylucifer (Scribe) on Jun 22, 2012 at 07:48 UTC

    Wonderful. This is what I wanted. Your one liner is working perfectly. Thank you very much.

    Yes, I understand that counting the commas is not the right way, but the file in question is my google contacts file exported to CSV and hence I was sure that there are not commas in between.

    Thanks again.

      Just when you start to think that, you discover one of your contacts is "Dinosaur, Jr." or you forget about all those lawyers who like to put "I. M. Alawyer, Esq." in their contact info, and you get a bunch of munged up data. Excel will correctly quote cells full of commas when you export to CSV. I haven't tried it with Google.

        The original question was "in-place-editing" required. Using the CSV module(s) Text::CSV_XS (used in DBD::CSV) and/or Text::CSV (which uses Text::CSV_XS if available for speed) you cannot in-place edit, but you can set up two instances: one for input, and one for output. This has to be used instead of DBD::CSV if there are no headers or no unique keys, somewhat along these lines:

        my $ci = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); my $co = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\n" + }); open my $fi, "<:encoding(utf-8)", "google.csv"; open my $fo, ">:encoding(utf-8)", "contacts.csv"; while (my $row = $ci->getline ($ci)) { $row->[34] =~ s/^0/+91/; $co->print ($fo, $row); } close $fi; close $fo;

        Which is safe for embedded comma's as you describe (when correctly quoted). On huge files, you can speed this even using bind_columns ().

        Another (huge) advantage is that this is MUCH faster than using DBD::CSV, has no size limits (other than disk storage) and can be used in streams, both in and out. DBD::CSV is likely to keep the complete file in memory, probably even twice.

        Enjoy, Have FUN! H.Merijn

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://977789]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2018-02-20 15:43 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (271 votes). Check out past polls.