http://www.perlmonks.org?node_id=718849


in reply to Re: Parsing MS SQL CSV export with Text::CSV_XS
in thread Parsing MS SQL CSV export with Text::CSV_XS

Quite so, there are other characters. I should have said, "the DATA only has ASCII in it." Nice, I get

255 ÿ 254 þ 68 D 0 65 A 0 82 R 0 75 K 0
Maybe I'll try just removing the first two characters.

Update: Add second sentence for clarification

Replies are listed 'Best First'.
Re^3: Parsing MS SQL CSV export with Text::CSV_XS
by Limbic~Region (Chancellor) on Oct 22, 2008 at 20:09 UTC
    andyford,
    If I remember correctly, there was a way to tell the database to not output those two bytes but I can't remember how. I vaguely recall it had something to do with not telling it you were doing CSV but rather text or perhaps it was just changing the extension from .csv to .txt. Unfortunately, the problem was from a customer providing the data and they could never be bothered to do it consistently so I end up writing something that tested the first two bytes and only stripping them if they were ord() > 127.

    Cheers - L~R

      Perfect, that's the answer. Well part 1 anyway. I also needed to remove a CR (^@) from in between every character to get Text::CSV_XS to parse it.

      I noticed a surprising thing: vim doesn't show the extra CR's in the original file with the "funny" lead two bytes. Remove them, and vim shows the CR's like this:

      D^@A^@R^@K^@0^@1^@D^@G^@B^@B^@H^@1^@D^@,^@1^@5^@.^@5^@2^@.^@1^@3^@6^@. +^@2^@3^@7^@,^@2^@0^@0^@8^@-^@1^@0^@-^@2^@0^@ ^@1^@9^@:^@0^@0^@:^@0^@8 +^@.^@0^@0^@0^@,^@1^@,^@1^@.^@6^@.^@6^@0^@0^@0^@,^@8^@1^@.^@2^@.^@0^@. +^@2^@5^@,^@-^@W^@o^@r^@k^@s^@t^@a^@t^@i^@o^@n^@P^@a^@r^@e^@n^@t^@s^@^ +M^@
      I wonder if vim recognizes it as a special file format.

        Your file appears to be in unicode format. The leading FEFF bytes are the byte order mark

        You can probably save the file from SQL Server in a plain text format. If I remember correctly, output format ASCII txt will do this for some applications.

        Alternatively, you can have Perl read and translate the unicode.

Re^3: Parsing MS SQL CSV export with Text::CSV_XS
by procura (Beadle) on Oct 27, 2008 at 11:03 UTC

    You have an encoding problem

    open my $fh, "<:encoding(utf16)", "file.csv";

    See for prove:

    $ od -t x1 xx.csv 0000000 ff fe 44 00 41 00 52 00 4b 00 0a 00 $ perl -we'open $a, "<:encoding(utf16)", "xx.csv" and print <$a>' | od + -t x1 0000000 44 41 52 4b 0a $