http://www.perlmonks.org?node_id=375497


in reply to Re: In search of a better way to trim string length
in thread In search of a better way to trim string

Thanks, BrowserUk!

Very cool code but harder to understand for me...I ran it and got the results as expected, with a tiny 'bug'. When I passed it the string "this is a very long sentence without spaces in between", the shortened string was "this is a very long ..." (no problem there).

When I passed it the string "thisisaverylongsentencewithoutspacesinbetween", the output was "thisisaverylongsentencewith..." i.e. without any space between the last character of that string and '...'.

Nothing serious really but just thought I would bring it up.

  • Comment on Re^2: In search of a better way to trim string length

Replies are listed 'Best First'.
Re^3: In search of a better way to trim string length
by BrowserUk (Patriarch) on Jul 19, 2004 at 09:49 UTC

    Maybe a little explanation will help?

    sub trimTo { my( $str, $n ) = @_; ## Give back what they gave us if the nothing to do return $str if length $str < $n; my $lastSpace = 1 + rindex( $str, ' ', $n-3 ); ## Subtracting 3 allows for adding the '...' ## rindex finds the last space preceding the position ## or -1 if it fails. ## Adding 1 means that we can test whether it found the space +with ## if( 1+rindex...) { ... ## or supply a default value ## 1 + rindex( ... ) || $default ## It also means that we get a length that we can supply ## directly to substr without having to increment it. substr( $str, 0, $lastSpace || $n-3 ) . '...'; ## The substr( ... ) returns from the start of string ## to the first space before the postition-3 (including that space +) ## or the first $n characters of the string. ## Combining the two avoids a temporary var. ## Tack on the '...' }

    With respect to the 'bug'. I actually consider the difference a bonus in as much as "stuff ..." indicates that there are more words that were truncated.

    Whereas "stuff..." indicates that the word itself was truncated.

    If you prefer the other behaviour, then this will do it.

    sub trimTo { my( $str, $n ) = @_; return $str if length $str < $n; my $lastSpace = 1 + rindex( $str, ' ', $n-3 ); ## Truncate length allowing to always include the ' ' before '. +..' my $truncLen = ( $lastSpace || $n-3 ) - 1 ; return substr( $str, 0, $truncLen ) . ' ...'; }

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

      This is cool but it suffers from a problem that the first one I wrote did too. Give it:

      my $mouse = 'This is a story about mice, "I dint know mice were so sma +rt."'; print trimTo($mouse,30), "\n";

      And you can get back things like: This is a story about mice,... The comma (or semi-color, period, quote, etc) is a bit jarring.

      This is a recent stab I've ended up using but I would love to see other ideas/answers/hybrids:

      sub chop_to_size { my ($text, $length) = @_; return $text if length $text < $length; # make room for ellipsis my $chop = $length - 3; $text =~ s/^\s*(.{$chop})\s*.+$/$1/; $text =~ s/[\s[:punct:]]+$//; $text .= "..."; $text; }
        The issue arises from what constitutes a "word". The basic definition is @words = split ' ', $line;, which works in most cases. However, it's arguable that a better definition could be @words = $line =~ /(\b\w+(?:['-]\w+)\b)/;. Of course, you're now depending on the definition of \w, which includes underscore and doesn't include apostrophe or hyphen. *shrugs* YMMV a huge amount. Parsing any natural language is much harder than parsing Perl which, as everyone knows, can't be done in Perl.

        Good luck! I mean it.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        I shouldn't have to say this, but any code, unless otherwise stated, is untested

        According to a writer's guide grammatically, if a quote is truncated after a commas or a full stop, the comma or fullstop should be left in place with no intervening spacing.

        Blah blah blah,...
        Blah blah blah....

        It's an interesting problem though. I think on it and get back to you:)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
      That was a great help! Thanks :)