craigt has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'd like to find a neat Perl snippet to break a string of words of length x into n parts of length y without violating any of the words. Can anyone help? Thanks in advance. craigt Thank you monks for your generosity.

Replies are listed 'Best First'.
Re: A Little String Help Please
by FunkyMonk (Chancellor) on May 05, 2007 at 17:12 UTC
    If I understand you correctly, I think you'll find that wrapping text is too complicated for a snippit. I'd suggest you look at Text::Wrap.
Re: A Little String Help Please
by c4onastick (Friar) on May 05, 2007 at 17:45 UTC
    This is an interesting problem. I just finished reading (well started reading) Effecive Perl Programming and this is actually a good place to use an array slice (in my attempt at a solution). This is by no means a final solution, but hopefully it'll get you close and generate some ideas from other (wiser) monks.

    #!/usr/bin/perl use warnings; use strict; my $n = shift; $n--; my $test = 'A huge string separated by lots and lots of words that I\' +d like to break up into n shorter strings of length y'; my @words = split /\s+/, $test; #First method, doesn't pickup the leftovers for(my $i = 0; $i+$n < $#words; $i += $n+1) { print join(' ', @words[$i..$i+$n]), "\n"; } print "\n\n"; #second method, picks up the leftovers my $j = 0; while($j+$n < $#words) { print join(' ', @words[$j..$j+$n]), "\n"; $j += $n+1; } print join(' ', @words[$j..$#words]), "\n";

      splice is the preferred tool in Perl for managing chunks of arrays. Consider:

      use strict; use warnings; my $test = ''; while (<DATA>) { if (! /^\d+$/) { $test .= $_; next; } chomp; my $n = $_; my @words = split /\s+/, $test; my @lines; # Use splice to pull out the desired lines push @lines, [splice @words, 0, $n] while @words; print "@$_\n" for @lines; } __DATA__ A huge string separated by lots and lots of words that I'd like to bre +ak up into n shorter strings of length y. This is stored in a __DATA__ secti +on to make a stand alone test program. Note that lines containing only a number provide an 'n' like the 10 on + the following line. 10 Note too that you can add more text that will be added and processed b +y any subsequent 'n' lines. 12

      Prints:

      A huge string separated by lots and lots of words that I'd like to break up into n shorter strings of length y. This is stored in a __DATA__ section to make a stand alone test program. Note that lines containing only a number provide an 'n' like the 10 on the following line. A huge string separated by lots and lots of words that I'd like to break up into n shorter strings of length y. This is stored in a __DATA__ section to make a stand alone test program. Note that lines containing only a number provide an 'n' like the 10 on the following line. Note too that you can add more text that will be added and processed by any subsequent 'n' lines.

      which nicely gets rid of all those nasty C style for loops with their tricksy conditions and increments, and also removes the need for fussy slices and error prone indexes.

      Perl's toolbox is pretty extensive and for some reason splice is often overlooked - it's worth knowing about.


      DWIM is Perl's answer to Gödel
Re: A Little String Help Please
by GrandFather (Sage) on May 05, 2007 at 20:26 UTC

    If this is production work in some fashion then using a module such as Text::Wrap as suggested above is the way to go. If this is a learning exercise or homework then we would like to see a little effort from you up front.

    There are a number of ways of approaching the task:

    • you could read the text into a string then use a regular expression to extract the required sub strings.
    • you could use index and substr to parse and chop up the string
    • you could parse the words into an array along with their lengths then use that to generate the output texts for different 'n's

    Pick an approach. Write some code. If you have trouble, come back to us with what you've tried and we will help further. We don't generally do what looks like people's homework for them however so don't come back and say "how do I do it using option 3".


    DWIM is Perl's answer to Gödel
Re: A Little String Help Please
by johngg (Canon) on May 05, 2007 at 22:31 UTC
    I think you are asking to wrap text to a given length, not a given number of words. The script here takes the wrap length as a command-line argument but it uses a hard coded data file at the moment; that would be easy to change. Given this data file

    the script

    use strict; use warnings; my $partLen = shift or die qq{No part length supplied\n}; die qq{Part length not integer\n} unless $partLen =~ m{^\d+$}; my $string = q{}; my $wordsFile = q{winter.txt}; open my $wordsFH, q{<}, $wordsFile or die qq{open: $wordsFile: $!\n}; { local $/; $string = <$wordsFH>; } close $wordsFH or die qq{close: $wordsFile: $!\n}; my @words = split m{\s+}, $string; my $longestWord = ( sort { $b <=> $a } map { length } @words )[0]; die qq{Part length too small to accomodate longest word\n} if $partLen < $longestWord; my @parts = (); my $part = q{}; while ( my $word = shift @words ) { $part = $word, next unless $part; if ( length($part) + length($word) + 1 > $partLen ) { push @parts, $part; $part = $word; } else { $part .= qq{ $word}; } } push @parts, $part; print qq{$_\n} for @parts;

    given an argument of 33, produces

    I hope this is of use.

    Cheers,

    JohnGG

      The part generation code can be condensed a little through use of a regex:

      use strict; use warnings; my $n = shift; my $text = do {local $/; <DATA>}; $text =~ s/\n/ /g; my @lines = $text =~ /(.{1,$n})\s+/g; print "$_\n" for @lines; __DATA__

      which generates the same output given the same input as the sample above.


      DWIM is Perl's answer to Gödel
        ... can be condensed a little ...

        I'd say that was condensing it quite a lot :)

        A much simpler approach, I wish I'd thought of it.

        Cheers,

        JohnGG

Re: A Little String Help Please
by Krambambuli (Curate) on May 05, 2007 at 21:53 UTC
    If for some reason you'd want to avoid splitting or analyzing the entire input string as a whole, you might consider a solution like
    #!/usr/bin/perl use warnings; use strict; my $n = shift; my $text = 'A huge string separated + by lots and lots of words that I\'d like to break up into n shorter strings of length y'; while ($text) { $text = cut_head( $text, $n ); } exit; sub cut_head { my ($text, $wrap_length) = @_; my $actual_length = 0; WORD: while ($actual_length < $wrap_length and $actual_length < length($ +text) ) { my $index = get_next_word( $text, $actual_length, $wrap_length + ); $actual_length ||= $index; last WORD if $index >= $wrap_length; $actual_length = $index; next WORD; } my $head = substr( $text, 0, $actual_length); my $tail = substr( $text, $actual_length ); print "$head\n"; return $tail; }
    Of course, depending on how exactly you'll want to deal with whitespace, some refining would be needed.

    Just add the get_next_word sub and you're done.