http://www.perlmonks.org?node_id=375470

kiat has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I use the code below to shorten an unusually long topic (hackish, perhaps) so that it doesn't mess the page's layout.

my $str = trim_length("thisisaverylongsentencewithoutspacesinbetween", + 30); print "$str\n"; sub trim_length { my ($str, $desired_len) = @_; my $tmp_len = $desired_len; # Shorten $str if it's is one continuous string # and its length is greater than the desired length if ($str !~ /\s/ && length($str) > $desired_len) { $str = substr($str, 0, $desired_len); $str .= ' ...'; } # Otherwise, if the length of $str is greater than the # desired length, iterate over each character of in $str # to find an appropriate spot that's a space to chop # the string elsif (length($str) > $desired_len) { #print "here" and exit; $str = substr($str, 0, $desired_len); my $token = substr($str, $desired_len-1, $desired_len); while($token !~ / /) { $str = substr($str, 0, $tmp_len); $token = substr($str, $tmp_len-1, $tmp_len); $tmp_len--; } $str .= '...'; } return $str; }
It works (with limited testing) but I'm intersted to find out how you would do it. All comments are welcomed :)

Thanks in anticipation :)

Update: Thanks to all for your sharing of code :)

Update2: I now have a problem of deciding which method to go with but I've undoubtedly learnt new ways of looking at the same problem. Great thanks and cheers

Replies are listed 'Best First'.
Re: In search of a better way to trim string length
by tachyon (Chancellor) on Jul 19, 2004 at 08:09 UTC

    You can return immediately if the string is less than or equal to the desired length. Then it depends on what you want to do to make it pretty. I have a couple of routines like this:

    sub shorten_string { my ( $string, $length ) = @_; $length ||= 64; $length = 10 if $length < 10; return $string unless length($string) > $length; my $chunk = int($length/2)-3; return substr($string,0,$chunk+1) . ' ... ' . substr($string,-$chunk +); } sub shorten_url { my ( $url, $length ) = @_; $length ||= 64; return $url unless $url and length($url) > $length; ( $url ) = split /\?/, $url; return $url if length($url) < $length or $url =~ m!\w+://[^/]+/?$!; $url =~ s!(\w+://[^/]+)!!; my $domain = $1 ? "$1/ ... " : '... '; $length -= length $domain; return $domain if $length < 1; my @bits = split '/', $url; my @keepers = $url =~ m!/$! ? ('') : (); my $tail = 1; while ( my $bit = pop @bits ) { next unless $bit; $length -= (length($bit) + 1); unshift @keepers, $bit if $tail or $length > 0; $tail = 0; last if $length < 1; } return @keepers ? ( "$domain/".join'/',@keepers ) : $domain; }

    cheers

    tachyon

      sub trimTo { my ($line, $length) = @_; $line=~s/([^ \s]{$length})([^ \s])/$1...\n$2/g; return $line; }
      Though only the first line is 30, and the rest is 31 chars, i think it's more or less doing what you search for ?
      It also makes sure strings that are exact 30 length dont have dots added.

      I just saw Jaspers solution, it's a bit different, i thought i post it anyway.

      Addition:
      Code isn't doing what was expected. Sorry Kiat :-)

        Hm...yours doesn't produce the desired outputs.

        Given the following inputs:

        1) "thisisthelongesttopicthatwilleverbepostedinperlmonks"

        2) "this is the longest topic that will ever be posted in perlmonks"

        Desired outputs:

        1a) thisisthelongesttopicthatwille ...

        2a) this is the longest topic ...

        Yours gave the following outputs:

        1a) thisisthelongesttopicthatwille...
        verbepostedinperlmonks

        2a) this is the longest topic that will ever be posted in perlmonks

        Did I miss something?

      Thanks, tachyon!

      Interesting! I like the idea of breaking the string up at the middle with '...'.

      cheers,

Re: In search of a better way to trim string length
by BrowserUk (Patriarch) on Jul 19, 2004 at 08:22 UTC

    sub trimTo { my( $str, $n ) = @_; return $str if length $str < $n; substr( $str, 0, 1 + rindex( $str, ' ', $n-3 ) || $n-3 ) . '...'; }

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
      Thanks, BrowserUk!

      Very cool code but harder to understand for me...I ran it and got the results as expected, with a tiny 'bug'. When I passed it the string "this is a very long sentence without spaces in between", the shortened string was "this is a very long ..." (no problem there).

      When I passed it the string "thisisaverylongsentencewithoutspacesinbetween", the output was "thisisaverylongsentencewith..." i.e. without any space between the last character of that string and '...'.

      Nothing serious really but just thought I would bring it up.

        Maybe a little explanation will help?

        sub trimTo { my( $str, $n ) = @_; ## Give back what they gave us if the nothing to do return $str if length $str < $n; my $lastSpace = 1 + rindex( $str, ' ', $n-3 ); ## Subtracting 3 allows for adding the '...' ## rindex finds the last space preceding the position ## or -1 if it fails. ## Adding 1 means that we can test whether it found the space +with ## if( 1+rindex...) { ... ## or supply a default value ## 1 + rindex( ... ) || $default ## It also means that we get a length that we can supply ## directly to substr without having to increment it. substr( $str, 0, $lastSpace || $n-3 ) . '...'; ## The substr( ... ) returns from the start of string ## to the first space before the postition-3 (including that space +) ## or the first $n characters of the string. ## Combining the two avoids a temporary var. ## Tack on the '...' }

        With respect to the 'bug'. I actually consider the difference a bonus in as much as "stuff ..." indicates that there are more words that were truncated.

        Whereas "stuff..." indicates that the word itself was truncated.

        If you prefer the other behaviour, then this will do it.

        sub trimTo { my( $str, $n ) = @_; return $str if length $str < $n; my $lastSpace = 1 + rindex( $str, ' ', $n-3 ); ## Truncate length allowing to always include the ' ' before '. +..' my $truncLen = ( $lastSpace || $n-3 ) - 1 ; return substr( $str, 0, $truncLen ) . ' ...'; }

        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
Re: In search of a better way to trim string length
by theAcolyte (Pilgrim) on Jul 19, 2004 at 08:09 UTC
    sub trim_length { my($string, $desired_length) = @_; # edit: swiped idea from Tachyon if((length($string)) < $desired_length) { return $string; } my @words = split(" ", $string); my $clipped = ""; foreach(@words) { my $temp = $clipped; $clipped .= $_; if((length($clipped)) > $desired_length) { $clipped = $temp; last +; } } if((length($clipped)) == 0) { $clipped = substr($string, 0, $desired_length); } $clipped .= "..."; return $clipped }

    Does the same as your code but with a little less effort. Breaks the string down into words, keeps adding words until it passes your minimum marker, then keeps the previous version. If the first word puts it over the minimum marker, then it goes back to the original $string and clips that.

    theAcolyte

      Thanks, theAcolyte!

      Ran your code. I think you need to add a space here:

      $clipped .= $_; # orginal $clipped .= "$_ "; # changed
      With your code, when I passed the sub a string such as "this is a very long sentence without spaces in between", it gets transformed to "thisisaverylongsentencewithout" i.e. the original spaces were gone.

      cheers

        blah ... right you are! See what I get for posting untested code? Then again, I always post untested code ... :P
Re: In search of a better way to trim string length
by davido (Cardinal) on Jul 19, 2004 at 08:26 UTC

    This snippet uses a pure regexp approach to breaking a string up into stringlets of no more than 'n' length, broken on the first whitespace before 'n', OR a hard break at 'n' characters if there is no preceeding whitespace on which to break. Not sure if this is what you're after, but it seems to be a start in the right direction. If you simply want to crop at 'n' or less (depending on whitespace), remove the /g modifier and capture only the first stringlet.

    use strict; use warnings; while ( my $string = <DATA> ) { chomp $string; my ( @stringlets ) = $string =~ m/(?:.{0,40}(?:\s|$))|(?:\S{40})/gs; print "$_\n" foreach @stringlets; print "\n\n"; } __DATA__ Now is the time for all good men to come to the aid of their country. Nowisthetimeforallgoodmentocometotheaidoftheircountry.

    Applying the "..." elipses is trivial at this point.

    I hope this helps... it's late, I may have misread the question. ;)


    Dave

      Thanks, Dave!

      I ran your code as is, but it doesn't quite give me the reuslts I was expecting. Maybe I'm missing something?

        Maybe I'm missing something?

        Only that I was just providing a starting point to illustrate a method. I figured you would tweak it to meet your needs. But in fairness, I should have provided an answer that produced exactly what you were looking for. I've modified my answer to do what I think your question was asking, and will paste it below...

        The problem with my original (I think) was that it broke your original string into substrings of 'n' characters. What you were looking for was simply truncation, with trailing '...' elipses. Here ya go...

        use strict; use warnings; my $len = 40; print "0123456789012345678901234567890123456789\n"; while ( my $string = <DATA> ) { chomp $string; $string =~ s/((?:.{0,$len}(?=\s|$))|(?:\S{$len}))(?:.+$)?/$1.../s; print "$string\n"; } __DATA__ Now is the time for all good men to come to the aid of their country. Nowisthetimeforallgoodmentocometotheaidoftheircountry.

        Hope this helps.


        Dave

Re: In search of a better way to trim string
by Jasper (Chaplain) on Jul 19, 2004 at 10:51 UTC
    Alternative with regex:
    sub trimTo { my ($line, $length) = @_; $line =~ s/^(.{0,$length}\s|.{$length}).*/$1.../; return $line; }
    I think has the same behaviour as the others. Not tested fully. Not that I think it's any faster than any substr method, but it's an alternative.
      Maybe just add a line to return the string when its length is shorter than the desired length. Otherwise, with a string like "this is a very long sentence", you get "this is a very long ..." instead of "this is a very long sentence".

      Thanks and cheers :)

        Yes, sorry about that. I'm sure there's a one character change that makes the regex not match when it's shorter than $length, but a) I can't see it immediately, b) that's just stupid golfing.
Re: In search of a better way to trim string
by Not_a_Number (Prior) on Jul 19, 2004 at 19:21 UTC

    Sorry if this is a bit late, FWIW...

    1) Most of the replies published (well, those that work :-) truncate strings that are exactly of the desired length, while your original version keeps these intact. Test case:

    my $str = trim_length ( 'this sentence is 30 signs long', 30 );

    If you use eg BrowserUK's code, this is easy to fix: just change

    return $str if length $str < $n;

    to:

    return $str if length $str <= $n;

    2) It might be an idea to strip off trailing whitespace, in order to avoid adding ' ...' to the end of a message that is essentially complete. My own solution, again FWIW, would be

    sub trim_length { my ( $str, $desired_len ) = @_; $str =~ s/\s+$//; # Strip trailing whitespace return $str if length $str <= $desired_len; return sprintf "%.${desired_len}s ...", substr $str, 0, ( rindex $str, ' ', $desired_len ); }

    dave

Re: In search of a better way to trim string
by runrig (Abbot) on Jul 20, 2004 at 00:26 UTC
    This assumes there are no newlines in the original string (at least not in an inconvenient place):
    use Text::Wrap qw(wrap $columns); $columns = 30; my $trimmed = wrap('', '', $str); $trimmed =~ s/\n.*/.../s;
Re: In search of a better way to trim string
by rrwo (Friar) on Jul 20, 2004 at 04:12 UTC

    Hm, there's Text::Truncate:

    use Text::Truncate; my $short = truncstr($long, $len);

    It may not be the most efficient way to do it, but you don't have to think about how to do it.

    Disclaimer: I wrote that module a while ago because I was sick of rewriting a routine to do that all the time. Any ideas for improvements would be appreciated.