Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re^4: Modifying CSV File

by roho (Canon)
on Jun 18, 2015 at 09:55 UTC ( #1130961=note: print w/replies, xml ) Need Help??

in reply to Re^3: Modifying CSV File
in thread Modifying CSV File

Here is the code I am using (with thanks to AnomalousMonk for the regex):

while (<$fh1>) { chomp; next if $_ eq ''; s{ ("[^"]+") }{ (my $one = $1) =~ s{,}{-}xmsg; $one =~ s{"}{}g; $o +ne; }xmsge; print $_, "\n"; }

The test file you made should be sufficient because the only thing I am changing is the comma to a dash and removing the quotes from the one column in question.

"Its not how hard you work, its how much you get done."

Replies are listed 'Best First'.
Re^5: Modifying CSV File
by Tux (Abbot) on Jun 18, 2015 at 12:41 UTC

    That ran in 4.194 on my dataset, which can be reduced by simplifying the regex even more.

    open my $io, "<", "test.csv"; open my $oh, ">", "out.csv"; while (<$io>) { s{ ("[^""]+") }{ (my $one = $1) =~ tr{,}{-}; $one =~ tr{""}{}d; $o +ne; }xge; print $oh $_; }

    runs in 3.229. All regex-based scripts will fail if

    • the first field is quoted;
    • the second field has a embedded double-quote (or an escaped character with the default " as escape)
    • any record anywhere in the dataset has an embedded newline, and the data after the newline has a double-quote

    As long as you are absolutely certain that the CSV data is uniformly and consistently laid out as in these two lines, you are safe.

    I would personally never take that risk, unless that two seconds are a problem. 5 seconds for 1.4 mln records is pretty fast, knowing it is always safe.

    Enjoy, Have FUN! H.Merijn
      Thanks for the regex mods. I am certain the CSV file will always be that format because it comes from another part of the system and if it were to change I would be the one asked to change it.

      "Its not how hard you work, its how much you get done."

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1130961]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2019-12-15 16:55 GMT
Find Nodes?
    Voting Booth?

    No recent polls found