http://www.perlmonks.org?node_id=375797


in reply to Strange character beginning text files

Well if chomp is not stripping it then its not a return character. You could try chop, if its always on the end of the line.

Alternatively you could convert it to bits like this: map { print unpack "B*", chr } qw\0001\ Or, to just characters like this: map {print chr }qw/0001/ ... and then check it out against an ASCII table on the net and see what it converts to.try here

Interestingly I ran this on the character and got a NULL string, that is, 00000000. Hence why chomp mightn't be picking it up. What you are seeing could be how your NULL appears in your flat file, which would also explain why its probably not showing in Notepad. Additionally when I attempt to convert it to a character using "chr" I get nothing appearing on my console, which also explains the possibility of NULL character as well.

Furthermore, strings in memory are terminated with a NULL character, which the computer uses to signify the end of the string. If your flat file is the result of something that was written to it from another program the character could very well be NULL's at the end of each string.

Again all this is hypothesis. How to remove them depends on where they are appearing in the flat file. If you are the creator of the flat file trying amended the program that writes it to chop the last character from each string/line before writing it to the flat file. Or convert the whole file to bits, delete all nulls, and then convert back to characters (probably not required unless you're really desperate).

As a side note, the 1 on the end of this 0001 suggests to me if could also be the 00000001 character which is the "Start of heading" character unless your question is simply relating to the box () which for me comes out as 00000000


Dean
The Funkster of Mirth
Programming these days takes more than a lone avenger with a compiler. - sam
RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers

Replies are listed 'Best First'.
Re^2: Strange character beginning text files
by tachyon (Chancellor) on Jul 20, 2004 at 04:59 UTC

    Actually chomp typically eats \n only which is the line feed char LF not the carriage return char CR....

    printf "CR \\r \\%03o 0x%02x\n", ord("\r"), ord("\r");; printf "LF \\n \\%03o 0x%02x\n\n", ord("\n"), ord("\n");; my $str = "str\015\012"; for( 1..2 ) { print "string '$str'\n"; print "length ", length $str, "\n"; chomp $str; print "string '$str'\n"; print "length ", length $str, "\n\n"; }

    Technically chomp removes any trailing string that corresponds to the current value of $/ (also known as $INPUT_RECORD_SEPARATOR in the English module).

    cheers

    tachyon

      Well, to be exact, chomp removes whatever string happens to match the current value of "$/" (input record separator), which defaults to "\015\012" for windows text-mode, "\n" for unix. (update: see replies below for correct info)

      And it only does this when the string matching $/ happens to occur at the end of the scalar value being chomped.

      perl -e '$/ = "\n"; $_ = "str\015\012"; chomp; s/(\s)/sprintf("%o",ord +($1))/eg; print $_,$/' # prints "str15" perl -e '$/ = "\r\n"; $_ = "str\015\012"; chomp; s/(\s)/sprintf("%o",o +rd($1))/eg; print $_,$/' # prints "str" perl -e '$/="\r\n"; $_ = "foo\015\012str\015\012"; chomp; s/(\s)/sprin +tf("%o",ord($1))/eg; print $_,$/' # prints "foo1512str"
      Update: Honest, I really did (start to) post this before tachyon made it redundant. And I confess I was not speaking from personal experience (lucky me) about the default value of $/ on ms-win -- thanks to tachyon for the correction.

        Also for your interest your assertion that $/ is CRLF on Win32 is wrong, nor does chomp remove the \r. As I understand it there is some internal magic the means that non binmode file read/writes get converted but you can see that $/ is "\n" - at least on my system. I have been bitten by \r not getting eaten by chomp on multiple occasions, usually related to Win32->*nix issues.

        C:\>type test.pl printf "CR \\r \\%03o 0x%02x\n", ord("\r"), ord("\r"); printf "LF \\n \\%03o 0x%02x\n", ord("\n"), ord("\n"); print $^O, $/; print "\$/ length ", length $/, " is ", (unpack "H*", $/), "\n\n"; my $str = "str\015\012"; for( 1..2 ) { print "string '$str'\n"; print "length ", length $str, "\n"; chomp $str; print "string '$str'\n"; print "length ", length $str, "\n\n"; } C:\>test.pl CR \r \015 0x0d LF \n \012 0x0a MSWin32 $/ length 1 is 0a string 'str ' length 5 'tring 'str length 4 'tring 'str length 4 'tring 'str length 4 C:\>

        cheers

        tachyon

        Was't that what I said at the end (or did you see that post in the 30 odd seconds or so before I posted that clarification ;-)

        cheers

        tachyon