If DBD::CSV can already handle that on its own, then I'm mystified as to why our regex would solve the problem
Me too. Since diotalevi seems to have had the same problem, it's possible there's a bug somewhere. I'd really appreciate help in tracking it down. Could you and diotalevi both please let me know which versions of SQL::Statement and DBD::File you're using, it's more likely to be in them.
| [reply] |
Ah HA! (*smack* -> head) I've figured out the problem, and it has nothing to do with DBD::CSV.
The CSV file that we're processing is exported in an unusual manner. Embedded quotes aren't escaped in any way. We have a routine that pre-processes the CSV to find those embedded quotes and escapes them (such that a " becomes a "" pair). The processed CSV file is then handed off to DBD::CSV for normal processing.
Recently, the user added a field to the CSV file, which promptly broke our program. In the process of trying to figure out the problem (because, of course, they didn't tell us they had added a field, only that the program had stopped working), we discovered that CR-LFs were embedded in some fields. We then leaped to the (incorrect) conclusion that this was a recent occurence and the cause of our problems.
We later learned about the addition of the new field. However, it turned out that the new field also triggered a bug in our pre-processing code, but we didn't know this at the time.
In trying to fix the supposed problem with the embedded URLs, we had replaced the buggy pre-processing code. So, when we ran the program with the (correct) CR-LF replacement code, it worked. When we reverted to the (buggy) old pre-processing code, it stopped working. That led us to the incorrect conclusion that the CR-LF replacement code was needed.
So, in summary, DBD::CSV handles embedded CR-LFs fine. Mystery solved! (Of course, if there's a nifty setting to handle unescaped quotes, I'd be glad to learn of it!)
| [reply] |
I've figured out the problem,
I'm glad for you! :-)
and it has nothing to do with DBD::CSV.
I'm glad for me! :-)
Of course, if there's a nifty setting to handle unescaped quotes, I'd be glad to learn of it!)
Are the fields already quote delimited? In other words do they resemble number 1 or number 2?
1. "foo","bad " bad","bar"
2. foo,bad " bad,bar
If you have records of type #1, I don't have much to suggest beyond the pre-processing you are already doing. If your records are of type #2, however, you ought to be able to handle that all within DBD::CSV. Set csv_delim_char to undef, in which case the stray quote will not be an embedded delimiter, it will just be a quote because you will have set the delimiter character to be undef rather than double-quote. If that won't work, show us some data and maybe someone will have a suggestion.
Thanks for following up to let us know how it turned out.
| [reply] |