Re: Re: Re: Re: Need a better way to count input lines

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Need a better way to count input lines by BrowserUk (Patriarch) on May 07, 2004 at 22:22 UTC
Are you using linux or cygwin? As you were processing Word data, I (stupidly) assumed that you were using Win32 (native). Try `perl -nle' exit if $. == 15; print unpack "H*", $_' testdata.txt` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l]
Re: Re: Re: Re: Re: Re: Need a better way to count input lines by Theo (Priest) on May 07, 2004 at 22:36 UTC
In the original post, I included the o/s along with the perl version (sun4-solaris-64int-ld). Sorry you missed it. As I'm sure you suspected, each line ends with a CR (0d.) Were you looking for anything else? `%perl -nle' exit if $. == 15; print unpack "H*", $_' testdata.txt 416c616e6f6e2c20426172740d 353539302f454c0d 2d2d2d2d0d 4f274c657769732c204a6f686e2e0d 2d2d2d2d2f2d2d2d202d2d0d 6a6f686e0d 4c65204d7563682c426f204a6f0d 333430362f313635204e530d 656440612e6e6c0d 4162652d4a656e2c204d61722d4a6f0d 333432312f31363444204e530d 63626573740d` [download] I was assuming that the "`== 15`" was a line count, but it only printed 12 - that's all the data that's in the test file. Do you want more data? -Theo- (so many nodes and so little time ... )	[reply] [d/l] [select]
Re+: Need a better way to count input lines by BrowserUk (Patriarch) on May 08, 2004 at 00:30 UTC
I apologise for having missed the OS info in your original post. By unpacking the output you posted into a file and reading from that, I can reproduce your results. However, if I open the resultant file and cut and paste the lines (which also fixes up the line endings) back into the __DATA__ section of my original program, it works as demonstrated. My (tentative) conclusion is that the line-endings in the original file are screwed up. My evidence is that by using the -l switch on the one-liner above, any native line endings should have been stripped, but as you point out, the 0d's are still there. That suggests to me that when the file was moved from Win32 to unix, either the line endings weren't converted, or they were converted incorrectly. P:\test>type test.pl #! perl -slw use strict; while( <DATA> ) { chomp; print pack 'H*', $_; } __DATA__ 416c616e6f6e2c20426172740d 353539302f454c0d 2d2d2d2d0d 4f274c657769732c204a6f686e2e0d 2d2d2d2d2f2d2d2d202d2d0d 6a6f686e0d 4c65204d7563682c426f204a6f0d 333430362f313635204e530d 656440612e6e6c0d 4162652d4a656e2c204d61722d4a6f0d 333432312f31363444204e530d 63626573740d P:\test>test > test.txt P:\test>type test.txt Alanon, Bart 5590/EL ---- O'Lewis, John. ----/--- -- john Le Much,Bo Jo 3406/165 NS ed@a.nl Abe-Jen, Mar-Jo 3421/164D NS cbest P:\test>u:od -t x1 test.dat 0000000 41 6c 61 6e 6f 6e 2c 20 42 61 72 74 0d 0d 0a 35 0000020 35 39 30 2f 45 4c 0d 0d 0a 2d 2d 2d 2d 0d 0d 0a 0000040 4f 27 4c 65 77 69 73 2c 20 4a 6f 68 6e 2e 0d 0d 0000060 0a 2d 2d 2d 2d 2f 2d 2d 2d 20 2d 2d 0d 0d 0a 6a 0000100 6f 68 6e 0d 0d 0a 4c 65 20 4d 75 63 68 2c 42 6f 0000120 20 4a 6f 0d 0d 0a 33 34 30 36 2f 31 36 35 20 4e 0000140 53 0d 0d 0a 65 64 40 61 2e 6e 6c 0d 0d 0a 41 62 0000160 65 2d 4a 65 6e 2c 20 4d 61 72 2d 4a 6f 0d 0d 0a 0000200 33 34 32 31 2f 31 36 34 44 20 4e 53 0d 0d 0a 63 0000220 62 65 73 74 0d 0d 0a 0000227 [download] As you can see from the `od` output, after reversing the packing process, the file ends up with line endings of x'0d0d0a', which I think indicates that there are dos-style line endings in the original file at your end, which perl-on-unix doesn't handle, and it is that that is causing the regexes to fail. The answer (I think) to your problem would be to use the appropriate utility (or one of the perl one-liners kicking around this site) to convert the line endings on the file prior to running the script. You could do it integrally to the script, but that would kind of negate the benefit of what I was originally trying to demonstrate, ie. that if you want to process 3 lines at a time, reading 3 lines on each iteration of the loop is much easier than having 3 cases within the loop. Anyway, I hope this helps in some way, and hasn't completely wasted your time. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l] [select]


Just another Perl shrine
	PerlMonks