http://www.perlmonks.org?node_id=876588

Cody Fendant has asked for the wisdom of the Perl Monks concerning the following question:

This is something I've been thinking about for a while because I used Perl to post classic literature to Twitter.

Text::Wrap wraps text at a given number. But in order to improve the human-readability of long text divided into shorter sections, I wanted to wrap the text, while adhering to a numerical limit, at the most meaningful point. So here's some code which attempts that.

The text is from Henry James, who's known for long, long sentences. So we take those long sentences and we look for a point where we can break them up at punctuation -- after all, the author himself has told us there's a pause at that point.

That's the first pass. If that fails, as it does in the text given here, because there's a 150-character passage right there in the first line, with no punctuation at all, we try and break at a conjunction, a word like 'and' or 'but' which forms a syntactic break point, the start of a sub-clause for instance.

If that fails we have no option but to break on whitespace. One conjunction-free, unpunctuated sentence included at the end to test that.

So, comments please? I don't like the use of dummy variables to hold the success or failure of the successive passes, but I can't see any other way to do it. I'd appreciate the thoughts of my fellow Monks.

#!/usr/local/bin/perl use Data::Dumper; my @phrases; my %conjunctions = map { $_ => 1 } qw (than that which who and but); while ( $line = <DATA> ) { # contains sentences; first catch your sen +tence chomp($line); while ( length($line) > 140 ) { $punctuation_split = 0; $conjunction_split = 0; # split the text of the first 140 chars into words # using a negative LIMIT so a trailing space isn't ignored @words = split( /\s+/, substr( $line, 0, 140 ), -1 ); # find a punctuated word to split on, # going backward so the string will be as long as possible for ( $i = @words ; $i > -1 ; $i-- ) { if ( $words[$i] =~ /[,:;]$/ ) { push( @phrases, join( ' ', @words[ 0 .. $i ] ) ); $line = join( ' ', @words[ ( $i + 1 ) .. $#words ] ) . substr( $line, 140 ); $punctuation_split = 1; last; } } unless ($punctuation_split) { # find a conjunction like 'that' to split on, # going backward as before for ( $i = @words ; $i > -1 ; $i-- ) { if ( $conjunctions{ $words[$i] } ) { push( @phrases, join( ' ', @words[ 0 .. ( $i - 1 ) ] ) ); $line = join( ' ', @words[ $i .. $#words ] ) . substr( $line, 140 ); $conjunction_split = 1; last; } } } unless ( $punctuation_split || $conjunction_split ) { # no meaningful split has been found, # split at leftmost space for ( $i = 140 ; $i > 0 ; $i-- ) { if ( substr( $line, $i, 1 ) eq ' ' ) { push( @phrases, substr( $line, 0, $i ) ); substr( $line, 0, $i ) = ''; last; } } } ##$line = undef; } push @phrases, $line; } print Dumper( \@phrases ); __DATA__ The Golden Bowl. The Prince had always liked his London, when it had come to him; he wa +s one of the modern Romans who find by the Thames a more convincing i +mage of the truth of the ancient state than any they have left by the + Tiber. Brought up on the legend of the City to which the world paid tribute, +he recognised in the present London much more than in contemporary Ro +me the real dimensions of such a case. If it was a question of an Imperium, he said to himself, and if one wi +shed, as a Roman, to recover a little the sense of that, the place to + do so was on London Bridge, or even, on a fine afternoon in May, at +Hyde Park Corner. It was not indeed to either of those places that these grounds of his +predilection, after all sufficiently vague, had, at the moment we are + concerned with him, guided his steps; he had strayed, simply enough, + into Bond Street, where his imagination, working at comparatively sh +ort range, caused him now and then to stop before a window in which o +bjects massive and lumpish, in silver and gold, in the forms to which + precious stones contribute, or in leather, steel, brass, applied to +a hundred uses and abuses, were as tumbled together as if, in the ins +olence of the Empire, they had been the loot of far-off victories. The young man's movements, however, betrayed no consistency of attenti +on--not even, for that matter, when one of his arrests had proceeded +from possibilities in faces shaded, as they passed him on the pavemen +t, by huge beribboned hats, or more delicately tinted still under the + tense silk of parasols held at perverse angles in waiting victorias. And the Prince's undirected thought was not a little symptomatic, sinc +e, though the turn of the season had come and the flush of the street +s begun to fade, the possibilities of faces, on the August afternoon, + were still one of the notes of the scene. He was too restless--that was the fact--for any concentration, and the + last idea that would just now have occurred to him in any connection + was the idea of pursuit. Bork bork bork bork bork bork bork bork bork bork bork bork bork bork +bork bork bork bork bork bork bork bork bork bork bork bork bork bork + bork bork bork bork bork bork bork bork bork.