blogical has asked for the wisdom of the Perl Monks concerning the following question:
I am attempting to reproduce output in a particular format that appears to be very similar to two-byte packed UTF16. It seems every even byte is null, and I haven't seen any characters exceed one byte in size. It also doesn't have the two-byte endian-ness header (like Encode will produce.) I've tested many different ways to get this output, and have succeeded in all but one aspect.
My problem is with newlines. A proper newline (uses windows CR-LF) in this format is \x{0D}\x{00}\x{0A}\x{00}. I have failed to get perl to treat the \n as two distinct characters, and keep ending up with \x{0D}\x{0A}\x{00}. I have tried various things with split, s///, $/, Encode ("UTF-16", binmode, pack and unpack but have not hit upon the solution.
BTW, the file format is from a Samsung YH-820 mp3 player .plp file ( I'm autogenerating my playlists so that I don't need to install their "special" software.) If anyone knows of a filter already out there that I can use I'd be happy to check it out.
"One is enough. If you are acquainted with the principle, what do you care for the myriad instances and applications?" - Henry David Thoreau, Walden
Re: CR-LF Newlines as 2 distinct characters
by radiantmatrix (Parson) on May 18, 2006 at 19:33 UTC
|
You mentioned playing with $/, but what about $\ (output sep)?
If you're trying to write compatible files (did I understand that correctly?), then why use \n at all? Why not just explicitly write out the EOL string? Like this:
sub plp_writeln {
# plp_writeln( $HANDLE, @elem );
# writes @elem joined as one line to the file $HANDLE for .plp
my $HANDLE = shift;
my @elem = @_;
print $HANDLE, @elem, qq[\x{0D}\x{00}\x{0A}\x{00}];
}
Or, similar idea, subbing all newlines for your custom newline:
sub plp_write {
# plp_write( $HANDLE, @elem );
# writes @elem to the file $HANDLE for .plp, interpolating newline
+s
my $HANDLE = shift;
my @elem = @_;
( my $str = join('',@elem) )=~s[\n][\x{0D}\x{00}\x{0A}\x{00}]gs;
print $HANDLE, $str;
}
| [reply] [d/l] [select] |
|
My problem has been integrating "my" newline sequence with the rest of the string, which also needs to get mangled. If I stringify it all as is, the CR-LF gets treated as a single character instead of being split with a \x00. If I pre-treat the newlines(s/\n/\x{0D}\x{00}\x{0A}/g) , it gets an extra \x00 in between when I mangle it.
As you suggest, I'll try a more explicit approach.
"One is enough. If you are acquainted with the principle, what do you care for the myriad instances and applications?" - Henry David Thoreau, Walden
| [reply] [d/l] |
|
Ah, I see. I don't know what you need to "mangle", exactly, but since you seem to have UTF16-like representations of purely 8-bit chars, you might take a risk and do s[\x00(?<!\x00)][]gs; on the way in (0) to make a "real ASCII" string. You can then mangle the 8-bit ASCII string comfortably.
Then do something like this on the way out:
$str = join( '', map{ "\x00$_" } split('',$str) );
That should pad you appropriately. It's cheating, but it might work.
[0]: The negative lookbehind is to make sure a "\x00\x00" doesn't get chopped away; it's untested, though.
| [reply] [d/l] [select] |
Re: CR-LF Newlines as 2 distinct characters
by blogical (Pilgrim) on May 18, 2006 at 21:15 UTC
|
#Add the header lines. Note extra undef for extra newline
unshift @playlist, ("PLP PLAYLIST", "VERSION 1.20", undef);
#Cheapo packing into two bytes
s/(.)/\1\x{00}/gc for @playlist;
#Explicit newlines added
$playlist = join "\x{0D}\x{00}\x{0A}\x{00}", @playlist;
open PLAYLIST, ">", 'playlist.plp' or die "Couldn't open playlist.plp:
+ $!";
#Hey perl, no funny stuff
binmode PLAYLIST;
$\ = undef;
print PLAYLIST $playlist;
Bonus points if you can golf this down- I'm just glad it finally works :) Thanks again to those who pointed me in the right direction.
"One is enough. If you are acquainted with the principle, what do you care for the myriad instances and applications?" - Henry David Thoreau, Walden
| [reply] [d/l] |
|
my $playlist= pack "S*", unpack "C*", join "\r\n", "PLP PLAYLIST", "VE
+RSION 1.20";
open PLAYLIST, ">", 'playlist.plp' or die "Couldn't open playlist.plp:
+ $!\n";
binmode PLAYLIST;
print PLAYLIST $playlist;
?
| [reply] [d/l] |
Re: CR-LF Newlines as 2 distinct characters
by Thelonius (Priest) on May 19, 2006 at 13:27 UTC
|
The encoding is "utf16le", that is "utf16 little endian", which is the standard representation on (little-endian) Windows x86 machines. Unicode include a byte-order mark character just so you can distinguish utf16be from utf16le.
It tried using binmode to see if I could get the :crlf layer to be applied before the encoding layer, but it doesn't work. This seems like something that should be fixed in perl, since Windows is the most commonly used OS in the world, after all.
Anyway, here's how I would do it: open PLAYLIST, ">", "playlist.plp"
or die "Couldn't open playlist.plp: $!\n";
binmode PLAYLIST, ":raw:encoding(utf16le)" or die "binmode: $!\n";
print PLAYLIST "abcd\r\n";
$\ = "\r\n";
print PLAYLIST "efgh";
You can actually combine the binmode options into the open statement like so:open PLAYLIST, ">:raw:encoding(utf16le)", "playlist.plp"
or die "Couldn't open playlist.plp: $!\n";
print PLAYLIST "abcd\r\n";
$\ = "\r\n";
print PLAYLIST "efgh";
| [reply] [d/l] [select] |
Re: CR-LF Newlines as 2 distinct characters
by samtregar (Abbot) on May 18, 2006 at 19:09 UTC
|
Can we see the code? Are you setting binmode on your filehandle? Are you running this on Windows? Have you tried playing with $\?
-sam
| [reply] |
|
I confused $/ with $\. Thank you for bringing that up.
I am currently running under windows. I have used binmode, but it seems to add a record seperator after every print statement.
Close approximation:
# @playlist has lines with no newlines
my $playlist = undef;
$playlist .= "$_\n" for ( "PLP PLAYLIST\nVERSION 1.20\n\n", @playlist
+);
open PLAYLIST, ">", 'playlist.plp' or die "Couldn't open playlist.plp:
+ $!";
print PLAYLIST "$_\x{00}" for (split //, $playlist);
This produces what I want EXCEPT for the newline CR-LF being lumped together.
"One is enough. If you are acquainted with the principle, what do you care for the myriad instances and applications?" - Henry David Thoreau, Walden
| [reply] [d/l] |
|
| [reply] |
|
|
|