|Perl: the Markov chain saw|
Re: Macintosh PDF's on Windowsby mr_mischief (Monsignor)
|on Mar 01, 2007 at 21:13 UTC||Need Help??|
For one, that's not a Mac to DOS line ending conversion. That's a DOS/Unix/Mac to Mac line ending conversion AFAICT. (You're also doing a capture that you're not using.) Mac to DOS would be s/\r/\r\n/ IIRC. However, most PDF files contain binary data, so that's probably not the best route.
A PDF file containing binary data must be transported and stored by means that preserve all bytes of the file faithfully; that is, as a binary file rather than a text file. Such a file is not portable to environments that impose reserved character codes, maximum line lengths, end-of-line conventions, or other restrictions.
The carriage return or linefeed either one or both together is an acceptable line ending according to the spec. Your software or the libraries you use would be wise to stick to the spec. From the 1.6 spec page 26:
The carriage return (CR) and line feed (LF) characters, also called newline characters, are treated as end-of-line (EOL) markers. The combination of a carriage return followed immediately by a line feed is treated as one EOL marker. For the most part, EOL markers are treated the same as any other white-space characters. However, sometimes an EOL marker is required or recommended—that is, the following token must appear at the beginning of a line.
The secret to your success, it seems, is in not trusting your friendly neighborhood OS to handle EOL for you. Open source and destination both binmode, use read() or sysread(), and determine line endings for yourself.
The PDF specifications are available in PDF format from Adobe for free download. You can get from 1.3 to 1.7 specs here. The full spec is cumbersome, but PDF::API2 and PDF::API2::Simple among others have already been built if you don't want to mess with it yourself. I haven't played with moving PDFs around too much, but the ones I generate using PDF::API2 and PDF::API2::Simple on Linux work great on Windows, and those have differing text-file line endings.
Christopher E. Stith