Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

CR-LF Newlines as 2 distinct characters

by blogical (Pilgrim)
on May 18, 2006 at 19:06 UTC ( [id://550320]=perlquestion: print w/replies, xml ) Need Help??

blogical has asked for the wisdom of the Perl Monks concerning the following question:

I am attempting to reproduce output in a particular format that appears to be very similar to two-byte packed UTF16. It seems every even byte is null, and I haven't seen any characters exceed one byte in size. It also doesn't have the two-byte endian-ness header (like Encode will produce.) I've tested many different ways to get this output, and have succeeded in all but one aspect.

My problem is with newlines. A proper newline (uses windows CR-LF) in this format is \x{0D}\x{00}\x{0A}\x{00}. I have failed to get perl to treat the \n as two distinct characters, and keep ending up with \x{0D}\x{0A}\x{00}. I have tried various things with split, s///, $/, Encode ("UTF-16", binmode, pack and unpack but have not hit upon the solution.

BTW, the file format is from a Samsung YH-820 mp3 player .plp file ( I'm autogenerating my playlists so that I don't need to install their "special" software.) If anyone knows of a filter already out there that I can use I'd be happy to check it out.

"One is enough. If you are acquainted with the principle, what do you care for the myriad instances and applications?"
- Henry David Thoreau, Walden

Replies are listed 'Best First'.
Re: CR-LF Newlines as 2 distinct characters
by radiantmatrix (Parson) on May 18, 2006 at 19:33 UTC

    You mentioned playing with $/, but what about $\ (output sep)?

    If you're trying to write compatible files (did I understand that correctly?), then why use \n at all? Why not just explicitly write out the EOL string? Like this:

    sub plp_writeln { # plp_writeln( $HANDLE, @elem ); # writes @elem joined as one line to the file $HANDLE for .plp my $HANDLE = shift; my @elem = @_; print $HANDLE, @elem, qq[\x{0D}\x{00}\x{0A}\x{00}]; }

    Or, similar idea, subbing all newlines for your custom newline:

    sub plp_write { # plp_write( $HANDLE, @elem ); # writes @elem to the file $HANDLE for .plp, interpolating newline +s my $HANDLE = shift; my @elem = @_; ( my $str = join('',@elem) )=~s[\n][\x{0D}\x{00}\x{0A}\x{00}]gs; print $HANDLE, $str; }
    <radiant.matrix>
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet
      My problem has been integrating "my" newline sequence with the rest of the string, which also needs to get mangled. If I stringify it all as is, the CR-LF gets treated as a single character instead of being split with a \x00. If I pre-treat the newlines(s/\n/\x{0D}\x{00}\x{0A}/g) , it gets an extra \x00 in between when I mangle it.
      As you suggest, I'll try a more explicit approach.

      "One is enough. If you are acquainted with the principle, what do you care for the myriad instances and applications?"
      - Henry David Thoreau, Walden

        Ah, I see. I don't know what you need to "mangle", exactly, but since you seem to have UTF16-like representations of purely 8-bit chars, you might take a risk and do s[\x00(?<!\x00)][]gs; on the way in (0) to make a "real ASCII" string. You can then mangle the 8-bit ASCII string comfortably.

        Then do something like this on the way out:

        $str = join( '', map{ "\x00$_" } split('',$str) );

        That should pad you appropriately. It's cheating, but it might work.

        [0]: The negative lookbehind is to make sure a "\x00\x00" doesn't get chopped away; it's untested, though.

        <radiant.matrix>
        A collection of thoughts and links from the minds of geeks
        The Code that can be seen is not the true Code
        I haven't found a problem yet that can't be solved by a well-placed trebuchet
Re: CR-LF Newlines as 2 distinct characters
by blogical (Pilgrim) on May 18, 2006 at 21:15 UTC
    Solution:
    #Add the header lines. Note extra undef for extra newline unshift @playlist, ("PLP PLAYLIST", "VERSION 1.20", undef); #Cheapo packing into two bytes s/(.)/\1\x{00}/gc for @playlist; #Explicit newlines added $playlist = join "\x{0D}\x{00}\x{0A}\x{00}", @playlist; open PLAYLIST, ">", 'playlist.plp' or die "Couldn't open playlist.plp: + $!"; #Hey perl, no funny stuff binmode PLAYLIST; $\ = undef; print PLAYLIST $playlist;
    Bonus points if you can golf this down- I'm just glad it finally works :) Thanks again to those who pointed me in the right direction.

    "One is enough. If you are acquainted with the principle, what do you care for the myriad instances and applications?"
    - Henry David Thoreau, Walden

      my $playlist= pack "S*", unpack "C*", join "\r\n", "PLP PLAYLIST", "VE +RSION 1.20"; open PLAYLIST, ">", 'playlist.plp' or die "Couldn't open playlist.plp: + $!\n"; binmode PLAYLIST; print PLAYLIST $playlist;

      ?

      - tye        

Re: CR-LF Newlines as 2 distinct characters
by Thelonius (Priest) on May 19, 2006 at 13:27 UTC
    The encoding is "utf16le", that is "utf16 little endian", which is the standard representation on (little-endian) Windows x86 machines. Unicode include a byte-order mark character just so you can distinguish utf16be from utf16le.

    It tried using binmode to see if I could get the :crlf layer to be applied before the encoding layer, but it doesn't work. This seems like something that should be fixed in perl, since Windows is the most commonly used OS in the world, after all.

    Anyway, here's how I would do it:

    open PLAYLIST, ">", "playlist.plp" or die "Couldn't open playlist.plp: $!\n"; binmode PLAYLIST, ":raw:encoding(utf16le)" or die "binmode: $!\n"; print PLAYLIST "abcd\r\n"; $\ = "\r\n"; print PLAYLIST "efgh";
    You can actually combine the binmode options into the open statement like so:
    open PLAYLIST, ">:raw:encoding(utf16le)", "playlist.plp" or die "Couldn't open playlist.plp: $!\n"; print PLAYLIST "abcd\r\n"; $\ = "\r\n"; print PLAYLIST "efgh";
Re: CR-LF Newlines as 2 distinct characters
by samtregar (Abbot) on May 18, 2006 at 19:09 UTC
    Can we see the code? Are you setting binmode on your filehandle? Are you running this on Windows? Have you tried playing with $\?

    -sam

      I confused $/ with $\. Thank you for bringing that up. I am currently running under windows. I have used binmode, but it seems to add a record seperator after every print statement. Close approximation:
      # @playlist has lines with no newlines my $playlist = undef; $playlist .= "$_\n" for ( "PLP PLAYLIST\nVERSION 1.20\n\n", @playlist +); open PLAYLIST, ">", 'playlist.plp' or die "Couldn't open playlist.plp: + $!"; print PLAYLIST "$_\x{00}" for (split //, $playlist);
      This produces what I want EXCEPT for the newline CR-LF being lumped together.

      "One is enough. If you are acquainted with the principle, what do you care for the myriad instances and applications?"
      - Henry David Thoreau, Walden

        I'm pretty sure you need binmode() here, at the very least. If you don't, Windows is going to try to expand LF into CRLF on the way out to disk. That's going to play hell with any attempt on your part to write out your own line-separator.

        Another thing I'd try - stop using \n altogether. Instead, write out the characters you want explicitely: "\x012\x013" (or however you write that).

        -sam

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://550320]
Approved by socketdave
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-07-22 13:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.