Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^3: Can Perl generate a page break character that Microsoft Word will recognize?

by jcb (Priest)
on Jan 01, 2020 at 02:43 UTC ( #11110813=note: print w/replies, xml ) Need Help??


in reply to Re^2: Can Perl generate a page break character that Microsoft Word will recognize?
in thread Can Perl generate a page break character that Microsoft Word will recognize?

Interesting. Word seems to use ASCII CR as paragraph break, so does it use ASCII LF or ASCII FF as page break? (There is also a forced end-of-line produced by Shift-Enter that does not start a new paragraph. Simply pressing Enter actually starts a new paragraph, which starts a new line as a side-effect.)

If we want to consider producing DOCX, it would be fairly easy to input AAA [Control-Enter to insert a page break] BBB and see what turns up in document.xml. Word DOC format uses Microsoft's "OLE Container" format, which turns out to be a miniature FAT filesystem, complete with its own allocation tables, and (if I remember correctly) a second FAT filesystem with smaller blocks stored inside a "file" in the outer container file. At least they only did that to one level of recursion, instead of producing a "filesystems all the way down" crawling horror.

  • Comment on Re^3: Can Perl generate a page break character that Microsoft Word will recognize?

Replies are listed 'Best First'.
Re^4: Can Perl generate a page break character that Microsoft Word will recognize?
by marto (Archbishop) on Jan 01, 2020 at 10:30 UTC

    Or just look up the XML to do what you want:

    <?xml version="1.0" encoding="UTF-8"?> <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingm +l/2006/main" xmlns:m="http://schemas.openxmlformats.org/officeDocumen +t/2006/math" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns: +r="http://schemas.openxmlformats.org/officeDocument/2006/relationship +s" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:ve="http://schemas.o +penxmlformats.org/markup-compatibility/2006" xmlns:w10="urn:schemas-m +icrosoft-com:office:word" xmlns:wne="http://schemas.microsoft.com/off +ice/word/2006/wordml" xmlns:wp="http://schemas.openxmlformats.org/dra +wingml/2006/wordprocessingDrawing"> <w:body> <w:p w:rsidR="00D479B1" w:rsidRDefault="00D479B1"> <w:r> <w:t>1234</w:t> </w:r> </w:p> <w:p w:rsidR="00D479B1" w:rsidRDefault="00D479B1"> <w:r> <w:t>5678</w:t> </w:r> </w:p> <w:sectPr w:rsidR="00D479B1" w:rsidSect="00D479B1"> <w:pgSz w:w="11906" w:h="16838" /> <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left=" +1800" w:header="708" w:footer="708" w:gutter="0" /> <w:cols w:space="708" /> <w:docGrid w:linePitch="360" /> </w:sectPr> </w:body> </w:document>

    becomes:

    <?xml version="1.0" encoding="UTF-8"?> <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingm +l/2006/main" xmlns:m="http://schemas.openxmlformats.org/officeDocumen +t/2006/math" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns: +r="http://schemas.openxmlformats.org/officeDocument/2006/relationship +s" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:ve="http://schemas.o +penxmlformats.org/markup-compatibility/2006" xmlns:w10="urn:schemas-m +icrosoft-com:office:word" xmlns:wne="http://schemas.microsoft.com/off +ice/word/2006/wordml" xmlns:wp="http://schemas.openxmlformats.org/dra +wingml/2006/wordprocessingDrawing"> <w:body> <w:p w:rsidR="00D479B1" w:rsidRDefault="00D479B1"> <w:r> <w:t>1234</w:t> </w:r> </w:p> <w:p> <w:r> <w:br w:type="page" /> </w:r> </w:p> <w:p w:rsidR="00D479B1" w:rsidRDefault="00D479B1"> <w:r> <w:t>5678</w:t> </w:r> </w:p> <w:sectPr w:rsidR="00D479B1" w:rsidSect="00D479B1"> <w:pgSz w:w="11906" w:h="16838" /> <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left=" +1800" w:header="708" w:footer="708" w:gutter="0" /> <w:cols w:space="708" /> <w:docGrid w:linePitch="360" /> </w:sectPr> </w:body> </w:document>

    See also the other links already provided in this thread, and their associated links. To be honest your work flow ('I'm using Perl to scrape text from a JavaScript that printed out one page at a time..') seems somewhat convoluted, but you don't go into much detail.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11110813]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2020-05-29 08:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (168 votes). Check out past polls.

    Notices?