Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: DBD::CSV and embedded CR-LFs

by jZed (Prior)
on Jun 11, 2004 at 21:25 UTC ( [id://363527]=note: print w/replies, xml ) Need Help??


in reply to DBD::CSV and embedded CR-LFs
in thread Finding CR-LFs within quoted CSV fields

If DBD::CSV can already handle that on its own, then I'm mystified as to why our regex would solve the problem
Me too. Since diotalevi seems to have had the same problem, it's possible there's a bug somewhere. I'd really appreciate help in tracking it down. Could you and diotalevi both please let me know which versions of SQL::Statement and DBD::File you're using, it's more likely to be in them.

Replies are listed 'Best First'.
DBD::CSV handles embedded CR-LFs fine!
by Wally Hartshorn (Hermit) on Jun 14, 2004 at 22:54 UTC

    Ah HA! (*smack* -> head) I've figured out the problem, and it has nothing to do with DBD::CSV.

    The CSV file that we're processing is exported in an unusual manner. Embedded quotes aren't escaped in any way. We have a routine that pre-processes the CSV to find those embedded quotes and escapes them (such that a " becomes a "" pair). The processed CSV file is then handed off to DBD::CSV for normal processing.

    Recently, the user added a field to the CSV file, which promptly broke our program. In the process of trying to figure out the problem (because, of course, they didn't tell us they had added a field, only that the program had stopped working), we discovered that CR-LFs were embedded in some fields. We then leaped to the (incorrect) conclusion that this was a recent occurence and the cause of our problems.

    We later learned about the addition of the new field. However, it turned out that the new field also triggered a bug in our pre-processing code, but we didn't know this at the time.

    In trying to fix the supposed problem with the embedded URLs, we had replaced the buggy pre-processing code. So, when we ran the program with the (correct) CR-LF replacement code, it worked. When we reverted to the (buggy) old pre-processing code, it stopped working. That led us to the incorrect conclusion that the CR-LF replacement code was needed.

    So, in summary, DBD::CSV handles embedded CR-LFs fine. Mystery solved! (Of course, if there's a nifty setting to handle unescaped quotes, I'd be glad to learn of it!)

    Wally Hartshorn

      I've figured out the problem,
      I'm glad for you! :-)
      and it has nothing to do with DBD::CSV.
      I'm glad for me! :-)
      Of course, if there's a nifty setting to handle unescaped quotes, I'd be glad to learn of it!)
      Are the fields already quote delimited? In other words do they resemble number 1 or number 2?
         1. "foo","bad " bad","bar"
         2. foo,bad " bad,bar   
      
      If you have records of type #1, I don't have much to suggest beyond the pre-processing you are already doing. If your records are of type #2, however, you ought to be able to handle that all within DBD::CSV. Set csv_delim_char to undef, in which case the stray quote will not be an embedded delimiter, it will just be a quote because you will have set the delimiter character to be undef rather than double-quote. If that won't work, show us some data and maybe someone will have a suggestion.

      Thanks for following up to let us know how it turned out.

        The data is like type #1, unfortunately. It would be something like this:

        "Smith","John",12/31/1962,"Author of "How to Break Programs" and other books","Bugger"

        I'm using a series regexes to change that to:

        "Smith","John",12/31/1962,"Author of ""How to Break Programs"" and other books","Bugger"

        It would be nice if the user just gave us a valid CSV file to begin with, but....

        Thanks, and sorry for the confusion!

        Wally Hartshorn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://363527]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-18 05:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found