Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

dump text file in ASCII and hex

by kevind0718 (Scribe)
on Jun 26, 2008 at 18:26 UTC ( #694239=perlquestion: print w/ replies, xml ) Need Help??
kevind0718 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks:

I come to you because I know you are kind and wise.
I have written a number of parsers for CSV files and they have all been straight forward. Just used Text::CSV_PP to parse the lines into an array and away you go.
However the lastest CSV file I have been tasked with parsing has caused me much grief. There is something different about it. If I attempt to parse the file as I received it, the following code returns undefined:
$status = $csv->parse($row);
$col = $csv->fields();

$col is undefined. CSV_PP seems to be getting confused. Which means this loop will also fail:
while (defined($col = $csv->getline($fh) )) {

If I open the same CSV file in MS-Excel and save it out under a different name, as a CSV file, the parsing works just fine.

To get a handle on what is going on here I wrote this bit of code.

while (defined($line = <> )) { print $line . "~~\n"; for ( $i=0; $i <length($line); $i++) { $char = substr($line, $i,1); $hex = sprintf("%1x", $char); print $char . "\t". $hex . "\n"; } #for print "--new line--\n"; } #while
What I am trying to do is read the text file in and then print out the line. Then print each character, every character including non-printables, and the corresponding hex value. This line is failing:
$hex = sprintf("%1x", $char);

just want to get the hex value of the character. But my Perl is not strong enough.

your kind assistance is requested.

kd

Comment on dump text file in ASCII and hex
Download Code
Re: dump text file in ASCII and hex
by zentara (Archbishop) on Jun 26, 2008 at 18:31 UTC
    This is what I use to dump ascii and hex, it gives a vertical hex under eash ascii letter, making it easy to read. Just feed it a file as an argument. I still get confused trying to figure out the extra Perl options...... but it works. :-)
    #!/usr/bin/perl -wnl012 # Prints the contents of a file a line at a time # followed by the ASCII value of each character in vertical columns. # Useful for debugging. # If no filename is specified then input is read from the keyboard. # Version 1.00 Ian Howlett ian@ian-howlett.com 6 July 2001 # Version 1.10 James Yolkowski ajy@sentex.net 8 July 2001 print; # Print the line we've just read @hexvals = map {sprintf "%02X", ord $_} split //; # Get hex value of e +ach char for $a (0, 1) {print map {substr $_, $a, 1} @hexvals} # Print the hex +values. print "\n";

    I'm not really a human, but I play one on earth CandyGram for Mongo
Re: dump text file in ASCII and hex
by psini (Deacon) on Jun 26, 2008 at 18:34 UTC

    Should it not be:

    $hex = sprintf("%2x", $char);

    with 2 instead of 1?

    Update: Yes, of course, %02x

    Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Re: dump text file in ASCII and hex
by tachyon-II (Chaplain) on Jun 26, 2008 at 18:35 UTC

    You need 2 hex chars to encode 256 chars. You also want the char number given by ord so this should work:

    $hex = sprintf "%02x", ord($char); # but why not just do it all in one line printf "%s %02x\n", $char, ord($char);
      Thank you monks, your suggestion helped me learn more about the deatils of the CSV file.

      I thought that there might be a strange end-of-line code or something like that. But that does not seem to be the case. The best I can determine is that Text::CSV_PP is having an issue with the double quotes("). Please take a look at the following test data
      "fred1234","bedrock quary","S","t","88579Y101","4851","2595708","US885 +79Y1010","MMM","3M CO","USD","USD","SB7",1,19610718,19610718,225212," +MMM UN","MMM.N","",1,"C",710.964086349999,710.964086349999,710.964086 +35,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","G1150G111","","2763958","BMG1150G1 +116","ACN","ACCENTURE LTD-CL A","USD","USD","SB7",19610718,19610718,2 +25212,"ACN UN","ACN.N","",1,"C",-699.4375011625,-699.4375011625,-699. +4375011625,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","004930202","","2575818","US0049302 +021","ATVI","ACTIVISION INC","USD","USD","SB7",19610718,19610718,2252 +12,"ATVI UW","ATVI.OQ","",1,"C",819.153462549999,819.153462549999,819 +.15346255,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","00817Y108","","2695921","US00817Y1 +082","AET","AETNA INC","USD","USD","SB7",19610718,19610718,225212,"AE +T UN","AET.N","",1,"C",2831.9813292375,2831.9813292375,2831.981329237 +5,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","00846U101","","2520153","US00846U1 +016","A","AGILENT TECHNOLOGIES INC","USD",19610718,19610718,225212,"A + UN","A.N","",1,"C",-45.9117876750024,-45.9117876750024,-45.911787675 +,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","008916108","","2015530","CA0089161 +081","AGU","AGRIUM INC","USD","USD","SB7",19610718,19610718,225212,"A +GU UN","AGU.N","",1,"C",4754.379720375,4754.379720375,4754.379720375, +,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","00971T101","","2507457","US00971T1 +016","AKAM","AKAMAI TECHNOLOGIES","USD","USD","SB7",19610718,19610718 +,225212,"AKAM UW","AKAM.OQ","",1,"C",2580.1367137875,2580.1367137875, +2580.1367137875,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","01741R102","","2526117","US01741R1 +023","ATI","ALLEGHENY TECHNOLOGIES INC","USD","USD","SB7",19610718,19 +610718,225212,"ATI UN","ATI.N","",1,"C",-655.71107175,-655.71107175,- +655.71107175,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","018804104","","2017677","US0188041 +042","ATK","ALLIANT TECHSYSTEMS INC","USD","USD","SB7",19610718,19610 +718,225212,"ATK UN","ATK.N","",1,"C",314.388352562499,314.38835256249 +9,314.3883525625,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","019589308","","2039831","US0195893 +088","AW","ALLIED WASTE INDUSTRIES INC","USD","USD","SB7",19610718,19 +610718,225212,"AW UN","AW.N","",1,"C",538.672694612502,538.6726946125 +02,538.6726946125,,,2.45938,,,,"R" test lines, without double double quotes "fred1234","bedrock quary","S","t","88579Y101","4851","2595708","US885 +79Y1010","MMM","3M CO","USD","USD","SB7",1,19610718,19610718,225212," +MMM UN","MMM.N",,1,"C",710.964086349999,710.964086349999,710.96408635 +,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","G1150G111",,"2763958","BMG1150G111 +6","ACN","ACCENTURE LTD-CL A","USD","USD","SB7",19610718,19610718,225 +212,"ACN UN","ACN.N",,1,"C",-699.4375011625,-699.4375011625,-699.4375 +011625,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","004930202",,"2575818","US004930202 +1","ATVI","ACTIVISION INC","USD","USD","SB7",19610718,19610718,225212 +,"ATVI UW","ATVI.OQ",,1,"C",819.153462549999,819.153462549999,819.153 +46255,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","00817Y108",,"2695921","US00817Y108 +2","AET","AETNA INC","USD","USD","SB7",19610718,19610718,225212,"AET +UN","AET.N",,1,"C",2831.9813292375,2831.9813292375,2831.9813292375,,, +2.45938,,,,"R" "fred1234","bedrock quary","L","t","00846U101",,"2520153","US00846U101 +6","A","AGILENT TECHNOLOGIES INC","USD",19610718,19610718,225212,"A U +N","A.N",,1,"C",-45.9117876750024,-45.9117876750024,-45.911787675,,,2 +.45938,,,,"R" "fred1234","bedrock quary","L","t","008916108",,"2015530","CA008916108 +1","AGU","AGRIUM INC","USD","USD","SB7",19610718,19610718,225212,"AGU + UN","AGU.N",,1,"C",4754.379720375,4754.379720375,4754.379720375,,,2. +45938,,,,"R" "fred1234","bedrock quary","S","t","00971T101",,"2507457","US00971T101 +6","AKAM","AKAMAI TECHNOLOGIES","USD","USD","SB7",19610718,19610718,2 +25212,"AKAM UW","AKAM.OQ",,1,"C",2580.1367137875,2580.1367137875,2580 +.1367137875,,,2.45938,,,,"R" "fred1234","bedrock quary","L","t","01741R102",,"2526117","US01741R102 +3","ATI","ALLEGHENY TECHNOLOGIES INC","USD","USD","SB7",19610718,1961 +0718,225212,"ATI UN","ATI.N",,1,"C",-655.71107175,-655.71107175,-655. +71107175,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","018804104",,"2017677","US018804104 +2","ATK","ALLIANT TECHSYSTEMS INC","USD","USD","SB7",19610718,1961071 +8,225212,"ATK UN","ATK.N",,1,"C",314.388352562499,314.388352562499,31 +4.3883525625,,,2.45938,,,,"R" "fred1234","bedrock quary","S","t","019589308",,"2039831","US019589308 +8","AW","ALLIED WASTE INDUSTRIES INC","USD","USD","SB7",19610718,1961 +0718,225212,"AW UN","AW.N",,1,"C",538.672694612502,538.672694612502,5 +38.6726946125,,,2.45938,,,,"R" test line, without any double quotes fred1234,bedrock quary,S,t,88579Y101,4851,2595708,US88579Y1010,MMM,3M +CO,USD,USD,SB7,1,19610718,19610718,225212,MMM UN,MMM.N,,1,C,710.96408 +6349999,710.964086349999,710.96408635,,,2.45938,,,,R fred1234,bedrock quary,L,t,G1150G111,,2763958,BMG1150G1116,ACN,ACCENTU +RE LTD-CL A,USD,USD,SB7,19610718,19610718,225212,ACN UN,ACN.N,,1,C,-6 +99.4375011625,-699.4375011625,-699.4375011625,,,2.45938,,,,R fred1234,bedrock quary,L,t,004930202,,2575818,US0049302021,ATVI,ACTIVI +SION INC,USD,USD,SB7,19610718,19610718,225212,ATVI UW,ATVI.OQ,,1,C,81 +9.153462549999,819.153462549999,819.15346255,,,2.45938,,,,R fred1234,bedrock quary,S,t,00817Y108,,2695921,US00817Y1082,AET,AETNA I +NC,USD,USD,SB7,19610718,19610718,225212,AET UN,AET.N,,1,C,2831.981329 +2375,2831.9813292375,2831.9813292375,,,2.45938,,,,R fred1234,bedrock quary,L,t,00846U101,,2520153,US00846U1016,A,AGILENT T +ECHNOLOGIES INC,USD,19610718,19610718,225212,A UN,A.N,,1,C,-45.911787 +6750024,-45.9117876750024,-45.911787675,,,2.45938,,,,R fred1234,bedrock quary,L,t,008916108,,2015530,CA0089161081,AGU,AGRIUM +INC,USD,USD,SB7,19610718,19610718,225212,AGU UN,AGU.N,,1,C,4754.37972 +0375,4754.379720375,4754.379720375,,,2.45938,,,,R fred1234,bedrock quary,S,t,00971T101,,2507457,US00971T1016,AKAM,AKAMAI + TECHNOLOGIES,USD,USD,SB7,19610718,19610718,225212,AKAM UW,AKAM.OQ,,1 +,C,2580.1367137875,2580.1367137875,2580.1367137875,,,2.45938,,,,R fred1234,bedrock quary,L,t,01741R102,,2526117,US01741R1023,ATI,ALLEGHE +NY TECHNOLOGIES INC,USD,USD,SB7,19610718,19610718,225212,ATI UN,ATI.N +,,1,C,-655.71107175,-655.71107175,-655.71107175,,,2.45938,,,,R fred1234,bedrock quary,S,t,018804104,,2017677,US0188041042,ATK,ALLIANT + TECHSYSTEMS INC,USD,USD,SB7,19610718,19610718,225212,ATK UN,ATK.N,,1 +,C,314.388352562499,314.388352562499,314.3883525625,,,2.45938,,,,R fred1234,bedrock quary,S,t,019589308,,2039831,US0195893088,AW,ALLIED W +ASTE INDUSTRIES INC,USD,USD,SB7,19610718,19610718,225212,AW UN,AW.N,, +1,C,538.672694612502,538.672694612502,538.6726946125,,,2.45938,,,,R
      If I run the above data through the following code, using the command: perl -w dumpascii2hex.pl testpos_b2.csv > testpos_b.2out.txt
      use Text::CSV_PP; use Data::Dumper; $csv = Text::CSV_PP->new(); # create a new CSV parser object while (defined($line = <> )) { print $line . "~~\n"; #**for ( $i=0; $i <length($line); $i++) { #** $char = substr($line, $i,1); #** $hex = sprintf("%02x", ord($char)); #** print $char . "\t". $hex . "\n"; #**} #for $status = $csv->parse($line); @col = $csv->fields(); print Dumper @col; print "--new line--\n"; } #while

      I get the following
      "fred1234","bedrock quary","S","t","88579Y101","4851","2595708","US885 +79Y1010","MMM","3M CO","USD","USD","SB7",1,19610718,19610718,225212," +MMM UN","MMM.N","",1,"C",710.964086349999,710.964086349999,710.964086 +35,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","G1150G111","","2763958","BMG1150G1 +116","ACN","ACCENTURE LTD-CL A","USD","USD","SB7",19610718,19610718,2 +25212,"ACN UN","ACN.N","",1,"C",-699.4375011625,-699.4375011625,-699. +4375011625,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","004930202","","2575818","US0049302 +021","ATVI","ACTIVISION INC","USD","USD","SB7",19610718,19610718,2252 +12,"ATVI UW","ATVI.OQ","",1,"C",819.153462549999,819.153462549999,819 +.15346255,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","00817Y108","","2695921","US00817Y1 +082","AET","AETNA INC","USD","USD","SB7",19610718,19610718,225212,"AE +T UN","AET.N","",1,"C",2831.9813292375,2831.9813292375,2831.981329237 +5,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","00846U101","","2520153","US00846U1 +016","A","AGILENT TECHNOLOGIES INC","USD",19610718,19610718,225212,"A + UN","A.N","",1,"C",-45.9117876750024,-45.9117876750024,-45.911787675 +,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","008916108","","2015530","CA0089161 +081","AGU","AGRIUM INC","USD","USD","SB7",19610718,19610718,225212,"A +GU UN","AGU.N","",1,"C",4754.379720375,4754.379720375,4754.379720375, +,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","00971T101","","2507457","US00971T1 +016","AKAM","AKAMAI TECHNOLOGIES","USD","USD","SB7",19610718,19610718 +,225212,"AKAM UW","AKAM.OQ","",1,"C",2580.1367137875,2580.1367137875, +2580.1367137875,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","01741R102","","2526117","US01741R1 +023","ATI","ALLEGHENY TECHNOLOGIES INC","USD","USD","SB7",19610718,19 +610718,225212,"ATI UN","ATI.N","",1,"C",-655.71107175,-655.71107175,- +655.71107175,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","018804104","","2017677","US0188041 +042","ATK","ALLIANT TECHSYSTEMS INC","USD","USD","SB7",19610718,19610 +718,225212,"ATK UN","ATK.N","",1,"C",314.388352562499,314.38835256249 +9,314.3883525625,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","019589308","","2039831","US0195893 +088","AW","ALLIED WASTE INDUSTRIES INC","USD","USD","SB7",19610718,19 +610718,225212,"AW UN","AW.N","",1,"C",538.672694612502,538.6726946125 +02,538.6726946125,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- test lines, without double double quotes ~~ $VAR1 = 'test lines'; $VAR2 = ' without double double quotes'; --new line-- "fred1234","bedrock quary","S","t","88579Y101","4851","2595708","US885 +79Y1010","MMM","3M CO","USD","USD","SB7",1,19610718,19610718,225212," +MMM UN","MMM.N",,1,"C",710.964086349999,710.964086349999,710.96408635 +,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","G1150G111",,"2763958","BMG1150G111 +6","ACN","ACCENTURE LTD-CL A","USD","USD","SB7",19610718,19610718,225 +212,"ACN UN","ACN.N",,1,"C",-699.4375011625,-699.4375011625,-699.4375 +011625,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","004930202",,"2575818","US004930202 +1","ATVI","ACTIVISION INC","USD","USD","SB7",19610718,19610718,225212 +,"ATVI UW","ATVI.OQ",,1,"C",819.153462549999,819.153462549999,819.153 +46255,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","00817Y108",,"2695921","US00817Y108 +2","AET","AETNA INC","USD","USD","SB7",19610718,19610718,225212,"AET +UN","AET.N",,1,"C",2831.9813292375,2831.9813292375,2831.9813292375,,, +2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","00846U101",,"2520153","US00846U101 +6","A","AGILENT TECHNOLOGIES INC","USD",19610718,19610718,225212,"A U +N","A.N",,1,"C",-45.9117876750024,-45.9117876750024,-45.911787675,,,2 +.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","008916108",,"2015530","CA008916108 +1","AGU","AGRIUM INC","USD","USD","SB7",19610718,19610718,225212,"AGU + UN","AGU.N",,1,"C",4754.379720375,4754.379720375,4754.379720375,,,2. +45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","00971T101",,"2507457","US00971T101 +6","AKAM","AKAMAI TECHNOLOGIES","USD","USD","SB7",19610718,19610718,2 +25212,"AKAM UW","AKAM.OQ",,1,"C",2580.1367137875,2580.1367137875,2580 +.1367137875,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","L","t","01741R102",,"2526117","US01741R102 +3","ATI","ALLEGHENY TECHNOLOGIES INC","USD","USD","SB7",19610718,1961 +0718,225212,"ATI UN","ATI.N",,1,"C",-655.71107175,-655.71107175,-655. +71107175,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","018804104",,"2017677","US018804104 +2","ATK","ALLIANT TECHSYSTEMS INC","USD","USD","SB7",19610718,1961071 +8,225212,"ATK UN","ATK.N",,1,"C",314.388352562499,314.388352562499,31 +4.3883525625,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- "fred1234","bedrock quary","S","t","019589308",,"2039831","US019589308 +8","AW","ALLIED WASTE INDUSTRIES INC","USD","USD","SB7",19610718,1961 +0718,225212,"AW UN","AW.N",,1,"C",538.672694612502,538.672694612502,5 +38.6726946125,,,2.45938,,,,"R" ~~ $VAR1 = undef; --new line-- test line, without any double quotes ~~ $VAR1 = 'test line'; $VAR2 = ' without any double quotes'; --new line-- fred1234,bedrock quary,S,t,88579Y101,4851,2595708,US88579Y1010,MMM,3M +CO,USD,USD,SB7,1,19610718,19610718,225212,MMM UN,MMM.N,,1,C,710.96408 +6349999,710.964086349999,710.96408635,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'S'; $VAR4 = 't'; $VAR5 = '88579Y101'; $VAR6 = '4851'; $VAR7 = '2595708'; $VAR8 = 'US88579Y1010'; $VAR9 = 'MMM'; $VAR10 = '3M CO'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '1'; $VAR15 = '19610718'; $VAR16 = '19610718'; $VAR17 = '225212'; $VAR18 = 'MMM UN'; $VAR19 = 'MMM.N'; $VAR20 = ''; $VAR21 = '1'; $VAR22 = 'C'; $VAR23 = '710.964086349999'; $VAR24 = '710.964086349999'; $VAR25 = '710.96408635'; $VAR26 = ''; $VAR27 = ''; $VAR28 = '2.45938'; $VAR29 = ''; $VAR30 = ''; $VAR31 = ''; $VAR32 = 'R '; --new line-- fred1234,bedrock quary,L,t,G1150G111,,2763958,BMG1150G1116,ACN,ACCENTU +RE LTD-CL A,USD,USD,SB7,19610718,19610718,225212,ACN UN,ACN.N,,1,C,-6 +99.4375011625,-699.4375011625,-699.4375011625,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'L'; $VAR4 = 't'; $VAR5 = 'G1150G111'; $VAR6 = ''; $VAR7 = '2763958'; $VAR8 = 'BMG1150G1116'; $VAR9 = 'ACN'; $VAR10 = 'ACCENTURE LTD-CL A'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'ACN UN'; $VAR18 = 'ACN.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '-699.4375011625'; $VAR23 = '-699.4375011625'; $VAR24 = '-699.4375011625'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,L,t,004930202,,2575818,US0049302021,ATVI,ACTIVI +SION INC,USD,USD,SB7,19610718,19610718,225212,ATVI UW,ATVI.OQ,,1,C,81 +9.153462549999,819.153462549999,819.15346255,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'L'; $VAR4 = 't'; $VAR5 = '004930202'; $VAR6 = ''; $VAR7 = '2575818'; $VAR8 = 'US0049302021'; $VAR9 = 'ATVI'; $VAR10 = 'ACTIVISION INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'ATVI UW'; $VAR18 = 'ATVI.OQ'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '819.153462549999'; $VAR23 = '819.153462549999'; $VAR24 = '819.15346255'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,S,t,00817Y108,,2695921,US00817Y1082,AET,AETNA I +NC,USD,USD,SB7,19610718,19610718,225212,AET UN,AET.N,,1,C,2831.981329 +2375,2831.9813292375,2831.9813292375,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'S'; $VAR4 = 't'; $VAR5 = '00817Y108'; $VAR6 = ''; $VAR7 = '2695921'; $VAR8 = 'US00817Y1082'; $VAR9 = 'AET'; $VAR10 = 'AETNA INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'AET UN'; $VAR18 = 'AET.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '2831.9813292375'; $VAR23 = '2831.9813292375'; $VAR24 = '2831.9813292375'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,L,t,00846U101,,2520153,US00846U1016,A,AGILENT T +ECHNOLOGIES INC,USD,19610718,19610718,225212,A UN,A.N,,1,C,-45.911787 +6750024,-45.9117876750024,-45.911787675,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'L'; $VAR4 = 't'; $VAR5 = '00846U101'; $VAR6 = ''; $VAR7 = '2520153'; $VAR8 = 'US00846U1016'; $VAR9 = 'A'; $VAR10 = 'AGILENT TECHNOLOGIES INC'; $VAR11 = 'USD'; $VAR12 = '19610718'; $VAR13 = '19610718'; $VAR14 = '225212'; $VAR15 = 'A UN'; $VAR16 = 'A.N'; $VAR17 = ''; $VAR18 = '1'; $VAR19 = 'C'; $VAR20 = '-45.9117876750024'; $VAR21 = '-45.9117876750024'; $VAR22 = '-45.911787675'; $VAR23 = ''; $VAR24 = ''; $VAR25 = '2.45938'; $VAR26 = ''; $VAR27 = ''; $VAR28 = ''; $VAR29 = 'R '; --new line-- fred1234,bedrock quary,L,t,008916108,,2015530,CA0089161081,AGU,AGRIUM +INC,USD,USD,SB7,19610718,19610718,225212,AGU UN,AGU.N,,1,C,4754.37972 +0375,4754.379720375,4754.379720375,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'L'; $VAR4 = 't'; $VAR5 = '008916108'; $VAR6 = ''; $VAR7 = '2015530'; $VAR8 = 'CA0089161081'; $VAR9 = 'AGU'; $VAR10 = 'AGRIUM INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'AGU UN'; $VAR18 = 'AGU.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '4754.379720375'; $VAR23 = '4754.379720375'; $VAR24 = '4754.379720375'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,S,t,00971T101,,2507457,US00971T1016,AKAM,AKAMAI + TECHNOLOGIES,USD,USD,SB7,19610718,19610718,225212,AKAM UW,AKAM.OQ,,1 +,C,2580.1367137875,2580.1367137875,2580.1367137875,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'S'; $VAR4 = 't'; $VAR5 = '00971T101'; $VAR6 = ''; $VAR7 = '2507457'; $VAR8 = 'US00971T1016'; $VAR9 = 'AKAM'; $VAR10 = 'AKAMAI TECHNOLOGIES'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'AKAM UW'; $VAR18 = 'AKAM.OQ'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '2580.1367137875'; $VAR23 = '2580.1367137875'; $VAR24 = '2580.1367137875'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,L,t,01741R102,,2526117,US01741R1023,ATI,ALLEGHE +NY TECHNOLOGIES INC,USD,USD,SB7,19610718,19610718,225212,ATI UN,ATI.N +,,1,C,-655.71107175,-655.71107175,-655.71107175,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'L'; $VAR4 = 't'; $VAR5 = '01741R102'; $VAR6 = ''; $VAR7 = '2526117'; $VAR8 = 'US01741R1023'; $VAR9 = 'ATI'; $VAR10 = 'ALLEGHENY TECHNOLOGIES INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'ATI UN'; $VAR18 = 'ATI.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '-655.71107175'; $VAR23 = '-655.71107175'; $VAR24 = '-655.71107175'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,S,t,018804104,,2017677,US0188041042,ATK,ALLIANT + TECHSYSTEMS INC,USD,USD,SB7,19610718,19610718,225212,ATK UN,ATK.N,,1 +,C,314.388352562499,314.388352562499,314.3883525625,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'S'; $VAR4 = 't'; $VAR5 = '018804104'; $VAR6 = ''; $VAR7 = '2017677'; $VAR8 = 'US0188041042'; $VAR9 = 'ATK'; $VAR10 = 'ALLIANT TECHSYSTEMS INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'ATK UN'; $VAR18 = 'ATK.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '314.388352562499'; $VAR23 = '314.388352562499'; $VAR24 = '314.3883525625'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line-- fred1234,bedrock quary,S,t,019589308,,2039831,US0195893088,AW,ALLIED W +ASTE INDUSTRIES INC,USD,USD,SB7,19610718,19610718,225212,AW UN,AW.N,, +1,C,538.672694612502,538.672694612502,538.6726946125,,,2.45938,,,,R ~~ $VAR1 = 'fred1234'; $VAR2 = 'bedrock quary'; $VAR3 = 'S'; $VAR4 = 't'; $VAR5 = '019589308'; $VAR6 = ''; $VAR7 = '2039831'; $VAR8 = 'US0195893088'; $VAR9 = 'AW'; $VAR10 = 'ALLIED WASTE INDUSTRIES INC'; $VAR11 = 'USD'; $VAR12 = 'USD'; $VAR13 = 'SB7'; $VAR14 = '19610718'; $VAR15 = '19610718'; $VAR16 = '225212'; $VAR17 = 'AW UN'; $VAR18 = 'AW.N'; $VAR19 = ''; $VAR20 = '1'; $VAR21 = 'C'; $VAR22 = '538.672694612502'; $VAR23 = '538.672694612502'; $VAR24 = '538.6726946125'; $VAR25 = ''; $VAR26 = ''; $VAR27 = '2.45938'; $VAR28 = ''; $VAR29 = ''; $VAR30 = ''; $VAR31 = 'R '; --new line--


      The only time the parsing works is when there are no double quotes in the text. This is confusing me. Because I parse six other files from this source with the same CSV encoding. Ie Strings are contained in double quotes, and there are no issues.

      I need the help of someone with more advanced Perl skills to tell me where my mistake is.

      Many thanks

      kd
Re: dump text file in ASCII and hex
by ganeshk (Monk) on Jun 26, 2008 at 21:08 UTC

    I just remembered the hexdump utility in Unix for doing this sort of thing. You can probably try the Data::Hexdump or Data::Hexdumper if you want to do it through perl.


    Thanks,
    Ganesh
Re: dump text file in ASCII and hex
by oko1 (Deacon) on Jun 27, 2008 at 04:16 UTC

    As the editor at the Linux Gazette, I get a lot of articles, code, etc. that's been written in every possible variety of editor out there and sometimes contains weird and invisible characters. My solution was to code up a script that I called "weirdchar" that will display and highlight the characters and their ASCII values along with the line (and line number) where they occur. It's solved and prevented a huge variety of problems for me over the years.

    Note: this is *nix-specific (works in Linux and Solaris), since it uses an external prog.

    #!/usr/bin/perl -w # Created by Ben Okopnik on Tue Feb 15 18:48:24 EST 2005 # Weird character highlighter my $a=`/usr/bin/tput -T $ENV{TERM} smso`; # Start 'standout' mode my $b=`/usr/bin/tput -T $ENV{TERM} rmso`; # End 'standout' mode my $re = qr/([^\011\012\015\040-\176])/; # "Inverted" list of valid c +hars while (<>){ print "Line $.: $_" if s/$re/"$a\\" . sprintf( "%03o", ord $1 ) . +$b/eg; }
    
    -- 
    Human history becomes more and more a race between education and catastrophe. -- HG Wells
    
      To look for nonprinting chars you can also run the file through cat -vt and diff it with the original.
Re: dump text file in ASCII and hex
by DrHyde (Prior) on Jun 27, 2008 at 10:38 UTC

    I'll wager a small amount that this will fix it for you:

    my $csv = Text::CSV_PP->new({binary => 1});

    Also it doesn't look like you're checking the $status returned from the parse() method. If that returns false, then the error_input() and error_diag() methods may be useful.

Re: dump text file in ASCII and hex
by salva (Abbot) on Jun 27, 2008 at 12:59 UTC
    I use this:
    sub hexdump { no warnings qw(uninitialized); my $data = shift; while ($data =~ /(.{1,32})/smg) { my $line=$1; my @c= (( map { sprintf "%02x",$_ } unpack('C*', $line)), ((" ") x 32))[0..31]; $line=~s/(.)/ my $c=$1; unpack("c",$c)>=32 ? $c : '.' /egms; print join(" ", @c, '|', $line), "\n"; } }
Re: dump text file in ASCII and hex
by Rudif (Hermit) on Jun 28, 2008 at 22:07 UTC
    kevind0718

    Text::CSV_PP manual is your friend. It mentions a method that can be helpful :

    $csv->error_diag()
    When I added a call and a print like this
    $status = $csv->parse($line); printf "status=%d\n", $status; unless ($status) { printf "error=%d %s\n", ( $csv->error_diag()); } @col = $csv->fields(); print Dumper \@col;
    I obtained this printout for your first test line :
    "fred1234","bedrock quary","S","t","88579Y101","4851","2595708","US885 +79Y1010","MMM","3M CO","USD","USD","SB7",1,19610718,19610718,225212," +MMM UN","MMM.N","",1,"C",710.964086349999,710.964086349999,710.964086 +35,,,2.45938,,,,"R" ~~ status=0 error=2027 EIQ - Quoted field not terminated $VAR1 = [ undef ]; --new line--
    You will notice that there is a space after the last doublequote character, and this is what the parser does not like.

    Below, I added a kludge that makes the problem go away.

    #! perl -w use strict; use Text::CSV_PP; use Data::Dumper; $csv = Text::CSV_PP->new(); # create a new CSV parser object while (defined($line = <> )) { chomp $line; $line =~ s/\" $/\"/; ### kludge to remove space after the last dou +blequote, if any print $line . "~~\n"; $status = $csv->parse($line); printf "status=%d\n", $status; unless ($status) { printf "error=%d %s\n", ( $csv->error_diag ()); } @col = $csv->fields(); print Dumper \@col; print "--new line--\n"; } #while
    HTH

    Rudif

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://694239]
Approved by almut
Front-paged by DrHyde
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2014-12-28 19:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (182 votes), past polls