In search of a better way to trim string

kiat has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I use the code below to shorten an unusually long topic (hackish, perhaps) so that it doesn't mess the page's layout.

my $str = trim_length("thisisaverylongsentencewithoutspacesinbetween",
+ 30);
print "$str\n";

sub trim_length {
   my ($str, $desired_len) = @_;
   my $tmp_len = $desired_len;
   # Shorten $str if it's is one continuous string
   # and its length is greater than the desired length
   if ($str !~ /\s/ && length($str) > $desired_len) {
      $str = substr($str, 0, $desired_len);
      $str .= ' ...';
   }
   # Otherwise, if the length of $str is greater than the
   # desired length, iterate over each character of in $str
   # to find an appropriate spot that's a space to chop
   # the string
   elsif (length($str) > $desired_len) {
      #print "here" and exit;
      $str = substr($str, 0, $desired_len);
      my $token = substr($str, $desired_len-1, $desired_len);
      while($token !~ / /) {
         $str = substr($str, 0, $tmp_len);
         $token = substr($str, $tmp_len-1, $tmp_len);
         $tmp_len--;
      }
      $str .= '...';
   }
   return $str;
}
[download]

It works (with limited testing) but I'm intersted to find out how you would do it. All comments are welcomed :)

Thanks in anticipation :)

Update: Thanks to all for your sharing of code :)

Update2: I now have a problem of deciding which method to go with but I've undoubtedly learnt new ways of looking at the same problem. Great thanks and cheers

Comment on In search of a better way to trim string Download Code

Replies are listed 'Best First'.
Re: In search of a better way to trim string length by tachyon (Chancellor) on Jul 19, 2004 at 08:09 UTC
You can return immediately if the string is less than or equal to the desired length. Then it depends on what you want to do to make it pretty. I have a couple of routines like this: sub shorten_string { my ( $string, $length ) = @_; $length \|\|= 64; $length = 10 if $length < 10; return $string unless length($string) > $length; my $chunk = int($length/2)-3; return substr($string,0,$chunk+1) . ' ... ' . substr($string,-$chunk +); } sub shorten_url { my ( $url, $length ) = @_; $length \|\|= 64; return $url unless $url and length($url) > $length; ( $url ) = split /\?/, $url; return $url if length($url) < $length or $url =~ m!\w+://[^/]+/?$!; $url =~ s!(\w+://[^/]+)!!; my $domain = $1 ? "$1/ ... " : '... '; $length -= length $domain; return $domain if $length < 1; my @bits = split '/', $url; my @keepers = $url =~ m!/$! ? ('') : (); my $tail = 1; while ( my $bit = pop @bits ) { next unless $bit; $length -= (length($bit) + 1); unshift @keepers, $bit if $tail or $length > 0; $tail = 0; last if $length < 1; } return @keepers ? ( "$domain/".join'/',@keepers ) : $domain; } [download] cheers tachyon	[reply] [d/l]
Re^2: In search of a better way to trim string length by Vennis (Pilgrim) on Jul 19, 2004 at 12:19 UTC
`sub trimTo { my ($line, $length) = @_; $line=~s/([^ \s]{$length})([^ \s])/$1...\n$2/g; return $line; }` [download] Though only the first line is 30, and the rest is 31 chars, i think it's more or less doing what you search for ? It also makes sure strings that are exact 30 length dont have dots added. I just saw Jaspers solution, it's a bit different, i thought i post it anyway. Addition: Code isn't doing what was expected. Sorry Kiat :-)	[reply] [d/l]
Re^3: In search of a better way to trim string length by kiat (Vicar) on Jul 19, 2004 at 13:31 UTC
Hm...yours doesn't produce the desired outputs. Given the following inputs: 1) "thisisthelongesttopicthatwilleverbepostedinperlmonks" 2) "this is the longest topic that will ever be posted in perlmonks" Desired outputs: 1a) thisisthelongesttopicthatwille ... 2a) this is the longest topic ... Yours gave the following outputs: 1a) thisisthelongesttopicthatwille... verbepostedinperlmonks 2a) this is the longest topic that will ever be posted in perlmonks Did I miss something?	[reply]
Re^4: In search of a better way to trim string length by Vennis (Pilgrim) on Jul 19, 2004 at 14:26 UTC
Re^2: In search of a better way to trim string length by kiat (Vicar) on Jul 19, 2004 at 08:38 UTC
Thanks, tachyon! Interesting! I like the idea of breaking the string up at the middle with '...'. cheers,	[reply]
Re: In search of a better way to trim string length by BrowserUk (Patriarch) on Jul 19, 2004 at 08:22 UTC
`sub trimTo { my( $str, $n ) = @_; return $str if length $str < $n; substr( $str, 0, 1 + rindex( $str, ' ', $n-3 ) \|\| $n-3 ) . '...'; }` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon	[reply] [d/l]
Re^2: In search of a better way to trim string length by kiat (Vicar) on Jul 19, 2004 at 08:57 UTC
Thanks, BrowserUk! Very cool code but harder to understand for me...I ran it and got the results as expected, with a tiny 'bug'. When I passed it the string "this is a very long sentence without spaces in between", the shortened string was "this is a very long ..." (no problem there). When I passed it the string "thisisaverylongsentencewithoutspacesinbetween", the output was "thisisaverylongsentencewith..." i.e. without any space between the last character of that string and '...'. Nothing serious really but just thought I would bring it up.	[reply]
Re^3: In search of a better way to trim string length by BrowserUk (Patriarch) on Jul 19, 2004 at 09:49 UTC
Maybe a little explanation will help? sub trimTo { my( $str, $n ) = @_; ## Give back what they gave us if the nothing to do return $str if length $str < $n; my $lastSpace = 1 + rindex( $str, ' ', $n-3 ); ## Subtracting 3 allows for adding the '...' ## rindex finds the last space preceding the position ## or -1 if it fails. ## Adding 1 means that we can test whether it found the space +with ## if( 1+rindex...) { ... ## or supply a default value ## 1 + rindex( ... ) \|\| $default ## It also means that we get a length that we can supply ## directly to substr without having to increment it. substr( $str, 0, $lastSpace \|\| $n-3 ) . '...'; ## The substr( ... ) returns from the start of string ## to the first space before the postition-3 (including that space +) ## or the first $n characters of the string. ## Combining the two avoids a temporary var. ## Tack on the '...' } [download] With respect to the 'bug'. I actually consider the difference a bonus in as much as "stuff ..." indicates that there are more words that were truncated. Whereas "stuff..." indicates that the word itself was truncated. If you prefer the other behaviour, then this will do it. `sub trimTo { my( $str, $n ) = @_; return $str if length $str < $n; my $lastSpace = 1 + rindex( $str, ' ', $n-3 ); ## Truncate length allowing to always include the ' ' before '. +..' my $truncLen = ( $lastSpace \|\| $n-3 ) - 1 ; return substr( $str, 0, $truncLen ) . ' ...'; }` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon	[reply] [d/l] [select]
Re^4: In search of a better way to trim string length by Your Mother (Archbishop) on Jul 19, 2004 at 19:06 UTC
Re^5: In search of a better way to trim string length by dragonchild (Archbishop) on Jul 19, 2004 at 19:37 UTC
Re^5: In search of a better way to trim string length by BrowserUk (Patriarch) on Jul 19, 2004 at 20:12 UTC
Re^4: In search of a better way to trim string length by kiat (Vicar) on Jul 19, 2004 at 10:15 UTC
Re: In search of a better way to trim string length by theAcolyte (Pilgrim) on Jul 19, 2004 at 08:09 UTC
`sub trim_length { my($string, $desired_length) = @_; # edit: swiped idea from Tachyon if((length($string)) < $desired_length) { return $string; } my @words = split(" ", $string); my $clipped = ""; foreach(@words) { my $temp = $clipped; $clipped .= $_; if((length($clipped)) > $desired_length) { $clipped = $temp; last +; } } if((length($clipped)) == 0) { $clipped = substr($string, 0, $desired_length); } $clipped .= "..."; return $clipped }` [download] Does the same as your code but with a little less effort. Breaks the string down into words, keeps adding words until it passes your minimum marker, then keeps the previous version. If the first word puts it over the minimum marker, then it goes back to the original $string and clips that. theAcolyte	[reply] [d/l]
Re^2: In search of a better way to trim string length by kiat (Vicar) on Jul 19, 2004 at 08:47 UTC
Thanks, theAcolyte! Ran your code. I think you need to add a space here: `$clipped .= $_; # orginal $clipped .= "$_ "; # changed` [download] With your code, when I passed the sub a string such as "this is a very long sentence without spaces in between", it gets transformed to "thisisaverylongsentencewithout" i.e. the original spaces were gone. cheers	[reply] [d/l]
Re^3: In search of a better way to trim string length by theAcolyte (Pilgrim) on Jul 20, 2004 at 00:30 UTC
blah ... right you are! See what I get for posting untested code? Then again, I always post untested code ... :P	[reply]
Re: In search of a better way to trim string length by davido (Cardinal) on Jul 19, 2004 at 08:26 UTC
This snippet uses a pure regexp approach to breaking a string up into stringlets of no more than 'n' length, broken on the first whitespace before 'n', OR a hard break at 'n' characters if there is no preceeding whitespace on which to break. Not sure if this is what you're after, but it seems to be a start in the right direction. If you simply want to crop at 'n' or less (depending on whitespace), remove the /g modifier and capture only the first stringlet. `use strict; use warnings; while ( my $string = <DATA> ) { chomp $string; my ( @stringlets ) = $string =~ m/(?:.{0,40}(?:\s\|$))\|(?:\S{40})/gs; print "$_\n" foreach @stringlets; print "\n\n"; } __DATA__ Now is the time for all good men to come to the aid of their country. Nowisthetimeforallgoodmentocometotheaidoftheircountry.` [download] Applying the "..." elipses is trivial at this point. I hope this helps... it's late, I may have misread the question. ;) Dave	[reply] [d/l]
Re^2: In search of a better way to trim string length by kiat (Vicar) on Jul 19, 2004 at 09:00 UTC
Thanks, Dave! I ran your code as is, but it doesn't quite give me the reuslts I was expecting. Maybe I'm missing something?	[reply]
Re^3: In search of a better way to trim string length by davido (Cardinal) on Jul 19, 2004 at 15:34 UTC
Maybe I'm missing something? Only that I was just providing a starting point to illustrate a method. I figured you would tweak it to meet your needs. But in fairness, I should have provided an answer that produced exactly what you were looking for. I've modified my answer to do what I think your question was asking, and will paste it below... The problem with my original (I think) was that it broke your original string into substrings of 'n' characters. What you were looking for was simply truncation, with trailing '...' elipses. Here ya go... `use strict; use warnings; my $len = 40; print "0123456789012345678901234567890123456789\n"; while ( my $string = <DATA> ) { chomp $string; $string =~ s/((?:.{0,$len}(?=\s\|$))\|(?:\S{$len}))(?:.+$)?/$1.../s; print "$string\n"; } __DATA__ Now is the time for all good men to come to the aid of their country. Nowisthetimeforallgoodmentocometotheaidoftheircountry.` [download] Hope this helps. Dave	[reply] [d/l]
Re: In search of a better way to trim string by Jasper (Chaplain) on Jul 19, 2004 at 10:51 UTC
Alternative with regex: `sub trimTo { my ($line, $length) = @_; $line =~ s/^(.{0,$length}\s\|.{$length}).*/$1.../; return $line; }` [download] I think has the same behaviour as the others. Not tested fully. Not that I think it's any faster than any substr method, but it's an alternative.	[reply] [d/l]
Re^2: In search of a better way to trim string by kiat (Vicar) on Jul 19, 2004 at 11:30 UTC
Maybe just add a line to return the string when its length is shorter than the desired length. Otherwise, with a string like "this is a very long sentence", you get "this is a very long ..." instead of "this is a very long sentence". Thanks and cheers :)	[reply]
Re^3: In search of a better way to trim string by Jasper (Chaplain) on Jul 19, 2004 at 12:20 UTC
Yes, sorry about that. I'm sure there's a one character change that makes the regex not match when it's shorter than $length, but a) I can't see it immediately, b) that's just stupid golfing.	[reply]
Re: In search of a better way to trim string by Not_a_Number (Prior) on Jul 19, 2004 at 19:21 UTC
Sorry if this is a bit late, FWIW... 1) Most of the replies published (well, those that work :-) truncate strings that are exactly of the desired length, while your original version keeps these intact. Test case: `my $str = trim_length ( 'this sentence is 30 signs long', 30 );` If you use eg BrowserUK's code, this is easy to fix: just change `return $str if length $str < $n;` to: `return $str if length $str <= $n;` 2) It might be an idea to strip off trailing whitespace, in order to avoid adding ' ...' to the end of a message that is essentially complete. My own solution, again FWIW, would be `sub trim_length { my ( $str, $desired_len ) = @_; $str =~ s/\s+$//; # Strip trailing whitespace return $str if length $str <= $desired_len; return sprintf "%.${desired_len}s ...", substr $str, 0, ( rindex $str, ' ', $desired_len ); }` [download] dave	[reply] [d/l] [select]
Re: In search of a better way to trim string by runrig (Abbot) on Jul 20, 2004 at 00:26 UTC
This assumes there are no newlines in the original string (at least not in an inconvenient place): `use Text::Wrap qw(wrap $columns); $columns = 30; my $trimmed = wrap('', '', $str); $trimmed =~ s/\n.*/.../s;` [download]	[reply] [d/l]
Re: In search of a better way to trim string by rrwo (Friar) on Jul 20, 2004 at 04:12 UTC
Hm, there's Text::Truncate: `use Text::Truncate; my $short = truncstr($long, $len);` [download] It may not be the most efficient way to do it, but you don't have to think about how to do it. Disclaimer: I wrote that module a while ago because I was sick of rewriting a routine to do that all the time. Any ideas for improvements would be appreciated.	[reply] [d/l]

Back to Seekers of Perl Wisdom