Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^3: Extract Paragraph From Text

by sundialsvc4 (Abbot)
on Sep 09, 2015 at 22:11 UTC ( [id://1141469]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Extract Paragraph From Text
in thread Extract Paragraph From Text

What I would expect is that text such as this might not contain any “end-of-line” character sequences at all.   Instead, the rendering engine would pour the text into the graphic container, line-by-line according to the size of the container and the selected font/font-size ... both of which presumably could change.   The only trustworthy “end-of-something” marker would be “end of paragraph,” but what might that be?   Who knows.

In this situation, I would suggest two specific things:

  1. Get the information directly from the original source file, and do it in binary mode.   (In other words, don’t tell Perl to expect record-separators of any sort.   All you want Perl to do, is to read exactly the bytes that are there, exactly as they are.   And, you really need to read the entire file at once ... slurp!)
  2. Before writing the code to do that, look at the original source file with the hex-editor as previously discussed, to see what is actually there and what might reasonably be relied-upon.
Don’t attempt to copy-and-paste into Perl source code:   you have no idea what your text-editor might actually do.   (And anything it might do, would only muddy the waters further.)

Perl is an extremely powerful data-extraction tool that can most certainly do whatever-it-is that you determine needs to be done.   So, please follow-up in this thread and tell us what you’ve found.   We’ll be happy to then help you further.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1141469]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-24 05:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found