Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

New line in Unicode again

by donno20 (Sexton)
on Apr 21, 2003 at 11:12 UTC ( [id://251974]=perlquestion: print w/replies, xml ) Need Help??

donno20 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
I am working a program to generate unicode file and then view by Notepad. All conversion work properly but I cannot make a newline.
I already tried out "\r", "\n", "\r\n", "\r\0", "\n\0", "\0\r", "\0\n", all of these not work.
Thanks for giving me some hint to solve the problem. ^_^

Replies are listed 'Best First'.
Re: New line in Unicode again
by halley (Prior) on Apr 21, 2003 at 12:15 UTC

    Did you remember to use binmode() on your output file stream?

    If you need to specify specific bytes for specific characters on specific platforms, don't rely on "\r" and "\n", which are semantic concepts. Try the octal route to call a spade a spade: "\015\012".

    I thought Notepad relied on "\015\012" sequences for hard newlines, and used "\015\015\012" internally for word-wraps. This may have changed. Open up an existing file in a hex editor to glean the right sequence.

    --
    [ e d @ h a l l e y . c c ]

Re: New line in Unicode again
by aquarium (Curate) on Apr 21, 2003 at 13:29 UTC
    Unicode does not assign control characters in its standard, so whatever you do will not be portable. On Win32 the following will do what you want I think: binmode OUTFILE; print OUTFILE "\x0\xD\x0\xA"; Chris
      The default encoding for Unicode on Windows is UTF-16LE, i.e. little endian, so it would actually be "\xD\0\xA\0". Strictly speaking, Unicode does assign control characters, but that still doesn't guarantee that the logical end-of-line will be the same between systems.

      You should start your file with a byte order marker (BOM), which is the same as a zero-width no-break spaces. It is U+FEFF, which in UTF-16LE is "\xFF\xFE".

      By the way, what you could have done is create a Unicode file in Notepad, then use Perl to look at the file and see what it has in it. Also note that Perl 5.8 has support for Unicode. See perlunicode and Encode::Unicode.

        >it would actually be "\xD\0\xA\0".
        Correct! I was wondering why it was "a\0b\0c\0" paired instead of "\0a\0b\0c". But I don't have "\x".

        >You should start your file with a byte order marker (BOM), which is the same as a zero-width no-break spaces. It is U+FEFF, which in UTF-16LE is "\xFF\xFE".
        I don't know what actually the code is. But I did a little trick by reading the first two bytes of an existing unicode file into a buffer. It was work !.

        >By the way, what you could have done is create a Unicode file in Notepad, then use Perl to look at the file and see what it has in it
        The first two bytes cannot be read by print(), and hidden in notepad. How do I know it was "\xFF\xFE" ?

        Cheers, ^_^

Re: New line in Unicode again
by donno20 (Sexton) on Apr 22, 2003 at 05:12 UTC
    YES !! I make it. Thanks monks. I have granted ++ votes to all of you.
    I summed up my code:
    binmode FILE; $line = $unicode . "\xD\0\xA\0";
    Cheers ^_^

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://251974]
Approved by dorko
Front-paged by Aristotle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-20 13:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found