Hope I can get some advice on solving this problem.
I have a file from a windows machine. It is encoded UTF16-LE with BOM of <FFFE> followed by text data. ALL data is in the format <4200> and the end of lines are <0d00> <0A00>. Each line is CVS. I need to read each line of the file, do some checking of some specific fields, and write to a new file some of the data with modifications.
My problem is I cannot parse on the CR/LF. Below is a test script I have written (I am not an experience perl programmer) which shows the different approaches I have tried. All can read the file and all print the @array just fine, but none of them recognize the end of file. I have a small test file but I am not sure how to post it.
#!/usr/local/bin/perl
#
#
use strict;
use warnings;
use charnames qw( :full );
my @segment_array;
#use File::BOM(); #this tells script to use the Byte Order Mark in r
+eading the files, but it is not on the system I am using
my $file_segment_name = "TestFile1.svd";
# examining the file in hex, it is utf8 encoded, with a Byte order Mar
+ker set at FFFE
#read the files
# open (FH_SEGMENT_FILE, "< $file_segment_name") || ERROR('open'
+, 'segment file');
# open (FH_SEGMENT_FILE, '<:encoding(UTF16-LE)', $file_segment_n
+ame) || ERROR('open', 'segment file');
# open (FH_SEGMENT_FILE, '<:raw:perlio:encoding(UTF16-LE):crlf',
+ $file_segment_name) || ERROR('open', 'segment file');
open (FH_SEGMENT_FILE, '< $file_segment_name' )|| ERROR('open
+', 'segment file');
# binmode (FH_SEGMENT_FILE, '<:crlf: encoding(UTF16-LE) ' );
# open (FH_SEGMENT_FILE, '<:raw:crlf: encoding(UTF16-LE) ', $fil
+e_segment_name );
# open (FH_SEGMENT_FILE, '< :crlf :encoding(UTF16)', $file_segme
+nt_name);
@segment_array=<FH_SEGMENT_FILE>;
close(FH_SEGMENT_FILE);
#print the file - it prints correctly
print "@segment_array";
print "\n\n"; #put some spaces in
for (my $i = 1; $i <=20 ; $i++){
my $segment_array= shift(@segment_array);;
print "$segment_array[$i]";
}
exit;
#subs below this point
#************************
#-------------------------
sub ERROR () {
print "Sever can't $_[0] the $_[1] \n";
}
#----------------------------
I don't know how to post the file and keep the encoding. So below is some of the file displayed using vi in the hex mode.
0000000: fffe 4000 4100 6900 7200 4d00 6100 6700 ..@.A.i.r.M.a.g.
0000010: 6e00 6500 7400 2000 5300 7500 7200 7600 n.e.t. .S.u.r.v.
0000020: 6500 7900 2000 4400 6100 7400 6100 0d00 e.y. .D.a.t.a...
0000030: 0a00 2300 5400 7900 7000 6500 3a00 2000 ..#.T.y.p.e.:. .
0000040: 7000 6100 7300 7300 6900 7600 6500 0d00 p.a.s.s.i.v.e...
0000050: 0a00 2300 4100 7000 7000 2000 5600 6500 ..#.A.p.p. .V.e.
0000060: 7200 7300 6900 6f00 6e00 3a00 2000 3800 r.s.i.o.n.:. .8.
0000070: 2e00 3200 2000 0900 2000 4200 7500 6900 ..2. ... .B.u.i.
0000080: 6c00 6400 3a00 2000 3200 3500 3400 3600 l.d.:. .2.5.4.6.
0000090: 3000 0d00 0a00 2300 4300 7200 6500 6100 0.....#.C.r.e.a.
00000a0: 7400 6500 6400 2000 6f00 6e00 3a00 2000 t.e.d. .o.n.:. .
00000b0: 3000 3900 3a00 3100 3300 3a00 3400 3700 0.9.:.1.3.:.4.7.
00000c0: 2000 3000 3400 2f00 3100 3000 2f00 3200 .0.4./.1.0./.2.
00000d0: 3000 3100 3200 0d00 0a00 2300 4300 6100 0.1.2.....#.C.a.
00000e0: 7200 6400 2000 4e00 6100 6d00 6500 2a00 r.d. .N.a.m.e.*.
00000f0: 3a00 2000 5500 6200 6900 7100 7500 6900 :. .U.b.i.q.u.i.
0000100: 7400 6900 2000 4e00 6500 7400 7700 6f00 t.i. .N.e.t.w.o.
0000110: 7200 6b00 7300 2000 5300 5200 2d00 3700 r.k.s. .S.R.-.7.
0000120: 3100 2d00 5500 5300 4200 2000 5700 6900 1.-.U.S.B. .W.i.
0000130: 7200 6500 6c00 6500 7300 7300 2000 4100 r.e.l.e.s.s. .A.
0000140: 6400 6100 7000 7400 6500 7200 2000 3000 d.a.p.t.e.r. .0.
0000150: 3000 3a00 3100 3500 3a00 3600 4400 3a00 0.:.1.5.:.6.D.:.
0000160: 3800 3400 3a00 4500 3100 3a00 4600 4100 8.4.:.E.1.:.F.A.
0000170: 0900 2000 4f00 5300 5600 6500 7200 7300 .. .O.S.V.e.r.s.
0000180: 6900 6f00 6e00 3a00 2000 3600 2e00 3100 i.o.n.:. .6...1.
when i run the program, the print @array looks like this:
@AirMagnet Survey Data
#Type: passive
#App Version: 8.2 Build: 25460
#Created on: 09:13:47 04/10/2012
#Card Name*: Ubiquiti Networks SR-71-USB Wireless Adapter 00:15:6D:84
+:E1:FA OSVersion: 6.100002 1
#Antenna Angle: 0.000000, Antenna Type:
#dim_X, dim_Y, GPS Map
&,6351.008789,3142.447021, 1
#Time,Xpos,Ypos,Channel,SSID,AP,SignalDBM,Signal,NoiseDBM,Noise,Media
+Type,NodeName,Speed,ByteCount(throughput),PacketCount,PacketLost,Lost
+Rate,RetryCount,RetryRate,Longitude,Latitude,Click,APFlags,MCSRx-Tx,I
+PerfSpeed,Heading, AntennaDirection, iPerf_Throughput_Up, iPerf_Throu
+ghput_Down
1334063627,4144.148438,1767.801514,11,'xfinitywifi','C4:0A:CB:68:B9:8
+1',-80,20,-94,1,'802.11gn','X1G025_W004','0','-1','-1','-1','-1','-1'
+,'-1',-7311.503300, 4051.325100,*,131,3855,0,0.000000, 0.000000
but the second section ALWAYS looks like this. Alternate lines are missed
#App Version: 8.2 Build: 25460
#Card Name*: Ubiquiti Networks SR-71-USB Wireless Adapter 00:15:6D:84:
+E1:FA OSVersion: 6.100002 1
#dim_X, dim_Y, GPS Map
#Time,Xpos,Ypos,Channel,SSID,AP,SignalDBM,Signal,NoiseDBM,Noise,MediaT
+ype,NodeName,Speed,ByteCount(throughput),PacketCount,PacketLost,LostR
+ate,RetryCount,RetryRate,Longitude,Latitude,Click,APFlags,MCSRx-Tx,IP
+erfSpeed,Heading, AntennaDirection, iPerf_Throughput_Up, iPerf_Throug
+hput_Down
1334063627,4144.148438,1767.801514,11,'optimumwifi','C4:0A:CB:68:B9:80
+',-80,20,-94,1,'802.11gn','X1G025_W004','0','-1','-1','-1','-1','-1',
+'-1',-7311.503300, 4051.325100,*,131,3855,0,0.000000, 0.000000
1334063627,4144.148438,1767.801514,6,'Smithtown','0C:D5:02:68:50:3F',-
+87,12,-94,1,'802.11g','0C:D5:02:68:50:3F','0','-1','-1','-1','-1','-1
+','-1',-7311.503300, 4051.325100,*,1,0,0,0.000000, 0.000000
1334063627,4144.148438,1767.801514,11,'Unknown','98:FC:11:90:FA:D0',-8
+9,9,-94,1,'802.11gn','98:FC:11:90:FA:D0','0','-1','-1','-1','-1','-1'
+,'-1',-7311.503300, 4051.325100,*,131,3855,0,0.000000, 0.000000