Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Interesting project!

blahblahblah ++ re use of a dictionary. Coupled with the regex in the OP (or, perhaps, one that's rather more specific and insistent on the presence of periods), you may have something of a start on that part of the problem.

It does seem to me that reflowing text (horizontally) around ascii art will be problematic, at best. Perhaps it would also be well to accept a less design-oriented target and accept leaving anything determined to be ascii art as an inline item (takeout box, dropin, for a couple of terms that may clarify my intent), with the reformatted text above and below.

eg, NOT:

test here yada ya da   0000 01 02 03 04...
ya da'in continues       0010 0f 0e 0d...

but rather:

test here yada ya da

0000 01 02 03 04...
0010 0f 0e 0d...

ya da'ing continues

My next notion may be unmanageable, but might be worth exploring: Would creation of a second dictionary containing such common elements as the address fragments at the beginning of each line of a hex dump (2nd example) and the multiple spaces initiating each line in the BBS logo be worth the effort?

and <big grin> while use of a dictionary might not have returned this result; the mouthful of the title might have been reduced by using "reparagraphing"?


In reply to Re: Programatically reparagraphinating text by ww
in thread Programatically reparagraphinating text by hacker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (2)
As of 2024-04-25 06:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found