Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Parsing MS SQL CSV export with Text::CSV_XS

by Limbic~Region (Chancellor)
on Oct 22, 2008 at 19:19 UTC ( #718842=note: print w/replies, xml ) Need Help??


in reply to Parsing MS SQL CSV export with Text::CSV_XS

andyford,
I have seen this very problem and I am willing to bet money you are wrong - there are characters at the beginning of the file that aren't not displayed however you are looking at it. Try this to see for sure:
#!/usr/bin/perl use strict; use warnings; open(my $fh, '<', 'file.csv') or die $!; { local $/ = \1; for (1 .. 10) { # first 10 bytes of the file my $byte = <$fh>; print join("\t", ord($byte), $byte), "\n"; } }

Cheers - L~R

Replies are listed 'Best First'.
Re^2: Parsing MS SQL CSV export with Text::CSV_XS
by andyford (Curate) on Oct 22, 2008 at 19:33 UTC
      andyford,
      If I remember correctly, there was a way to tell the database to not output those two bytes but I can't remember how. I vaguely recall it had something to do with not telling it you were doing CSV but rather text or perhaps it was just changing the extension from .csv to .txt. Unfortunately, the problem was from a customer providing the data and they could never be bothered to do it consistently so I end up writing something that tested the first two bytes and only stripping them if they were ord() > 127.

      Cheers - L~R

        Perfect, that's the answer. Well part 1 anyway. I also needed to remove a CR (^@) from in between every character to get Text::CSV_XS to parse it.

        I noticed a surprising thing: vim doesn't show the extra CR's in the original file with the "funny" lead two bytes. Remove them, and vim shows the CR's like this:

        D^@A^@R^@K^@0^@1^@D^@G^@B^@B^@H^@1^@D^@,^@1^@5^@.^@5^@2^@.^@1^@3^@6^@. +^@2^@3^@7^@,^@2^@0^@0^@8^@-^@1^@0^@-^@2^@0^@ ^@1^@9^@:^@0^@0^@:^@0^@8 +^@.^@0^@0^@0^@,^@1^@,^@1^@.^@6^@.^@6^@0^@0^@0^@,^@8^@1^@.^@2^@.^@0^@. +^@2^@5^@,^@-^@W^@o^@r^@k^@s^@t^@a^@t^@i^@o^@n^@P^@a^@r^@e^@n^@t^@s^@^ +M^@
        I wonder if vim recognizes it as a special file format.

      You have an encoding problem

      open my $fh, "<:encoding(utf16)", "file.csv";

      See for prove:

      $ od -t x1 xx.csv 0000000 ff fe 44 00 41 00 52 00 4b 00 0a 00 $ perl -we'open $a, "<:encoding(utf16)", "xx.csv" and print <$a>' | od + -t x1 0000000 44 41 52 4b 0a $

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://718842]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2020-01-21 00:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?