Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Smart Substrings

by cei (Monk)
on Apr 03, 2001 at 03:49 UTC ( [id://69149]=perlquestion: print w/replies, xml ) Need Help??

cei has asked for the wisdom of the Perl Monks concerning the following question:

I am using Perl to generate a SELECT pull-down in an HTML form. I'd like to limit the length of the text that gets shown, so that the menu doesn't become too wide. I'm trying to figure out the best method for doing this.

For the sake of argument, lets say...

  • I want a maximum string length of 15 characters
  • The text for the OPTION is coming from a database, and can contain spaces and punctuation
  • I want to chop long entries between words and add a "..." to the end

    Thought 1

    1. Compare string length against maxlength
    2. If stringlength>maxlength, use a regex to chop off the end at a whitespace, and concatonate a "..."
    3. Compare new stringlength against maxlength and repeat if necessary

    Thought 2

    1. Similar to above, but use split to put all of the words into a temporary array, then reconstruct the string, leaving out words as necessary

    There are probably other ways as well.

    How would YOU do it?

  • Replies are listed 'Best First'.
    Re: Smart Substrings
    by elusion (Curate) on Apr 03, 2001 at 03:54 UTC
      I would do it like this:
      $string =~ s/^(.{15}).+$/$1/;
      To split between words and add .. I'd use:
      $string =~ s/^(.{1,15})\s*.*$/$1.../;
      This assumes you want a string 15 chars long + the "...", if there's not a space in the 15 chars it chops it there anyways.

      - p u n k k i d
      "Reality is merely an illusion, albeit a very persistent one." -Albert Einstein

        Your first code snippet is a good example of bringing out the regex chain-gun when a simple substr would do:
        my $short = substr $string, 0, 15;
        This is equivalent, but faster and more readable. Don't use a regex unless you actually need it.

        Your second snippet doesn't split on words. You would need to change the \s* to a \s+ for it to do that, but then it would break on a starting word greater than 15 chars.

        I don't mean to offer harsh criticism, but one of my pet peeves is seeing regexes used when a simple index/substr would do.

           MeowChow                                   
                       s aamecha.s a..a\u$&owag.print

          Why would you keep a peeve as a pet? Wouldn't a dog or a cat be more fun?

          (Sorry, couldn't resist ;^)

    (bbfu) Re: Smart Substrings
    by bbfu (Curate) on Apr 03, 2001 at 04:46 UTC

      I rather like punkkid's answer but, in the spirit of TIMTOWTDI, here's another way.

      my $index = rindex($string, ' ', 15); # Chop it at 15 even if we didn't find a space... $index = 15 if($index < 0); $string = substr($string, 0, $index);

      Note, however, that this will be foiled on words separated with more than one space... Or tabs.

      You could, of course, do away with the synthetic variable ($index) if you don't care about chopping non-space containing strings... But you probably do. That would make it a one-liner. =)

      bbfu
      Seasons don't fear The Reaper.
      Nor do the wind, the sun, and the rain.
      We can be like they are.

        A bit opaque, but you can do this for a one-liner:
        my $short = substr($string, 0, rindex($string, ' ', $len) + 1 || $len) +;
        This leaves you a trailing space, however.
           MeowChow                                   
                       s aamecha.s a..a\u$&owag.print
    Re: Smart Substrings
    by MeowChow (Vicar) on Apr 03, 2001 at 07:20 UTC
      I'm not sure I like any of the solutions yet offered (can you tell?) so I whipped this up:
      sub shorten { local $_ = shift; my $len = shift; my $pos; return $_ if $len >= length; $len -= 3; $pos = pos while /\w(?=\s)/g && $len >= pos; $pos ||= $len; substr($_, 0, $pos)."..."; } print shorten("misunderestimate", 15), "\n"; print shorten("all work and no play makes MeowChow a dull boy", 15), " +\n"; print shorten("imapirateiamiam", 15), "\n";
      If you don't care about matching multiple spaces or about matching tabs, I would rewrite the second and third-to-last lines in the sub as follows (which would essentially make it bbfu's solution):
      $pos = rindex $_, " ", $len; $pos = $len if $pos < 0;
      This would make a good Golf contest by the way (hint, hint)...
         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print
        LOL :)

        "Remember, just say NO to regex (when simpler functions will do)". - MeowChow

        sorry, couldn't resist it :)

        cLive ;-)

          I knew this would be coming...

          Yes, I did resort to a regex, but only because of two exceptional cases (extra spaces and tabs). Not knowing the poster's requirements in this regard, I wrote something that would handle these cases. The non-regex solution is still a much better fit to this problem, and the regex that I did use was kept fairly simple. It would even have been /\w\s\/, but I chose to simplify the indexing arithmetic instead.

          Hmm... all this backtracking, I'm beginning to sound like a regex myself :)

             MeowChow                                   
                         s aamecha.s a..a\u$&owag.print
    Re: Smart Substrings
    by cLive ;-) (Prior) on Apr 03, 2001 at 05:28 UTC

      Only one match needed.

      if (length($string) > 15) { $string =~ s/^(.{0,12})\b\s.*/$1.../gi; }

      cLive ;-)

      Update:

      MeowChow pointed out typo - missed out .* (had it on other machine running demo :)

      Actually, if there's a chance that the first word will be longer than 15 chars, you'd need to expand this a bit to cut off the end of the first word. Oh hell, let's change a few things here:

      if (length($string) > 15) { unless ($string =~ s/^(.{0,12})\b\s.*/$1.../gi) { $string = substr ($string, 0, 12) . '...'; } }

      Finally, I disagree with MeowChow (below). I think \b *is* needed. Otherwise this could happen:

      A Satsuma - Orange would become A Satsuma -... </CODE>

        This is broken. You left out the .* at the end of your substitution to eat up the rest of the string, and if you put it in, you will find that it still breaks on input such as "antidisestablishment". Also, your \b is superfluous.

        Remember, just say NO to regex (when simpler functions will do).

           MeowChow                                   
                       s aamecha.s a..a\u$&owag.print
    Re: Smart Substrings
    by extremely (Priest) on Apr 03, 2001 at 09:39 UTC
      Honestly, you'd be better off with $string = substr($string,0,12).'...' if length($string)>15; Since sooner or later you will wind up with a string like "a misunderstanding" and your "smart" substring will leave you with "a..." and really, what good is that? =)

      Once you add in a minimum string length, you are just plain going to be doing too much work for the benefit.

      --
      $you = new YOU;
      honk() if $you->love(perl)

        you know,

        I *did* think of that, but was happier giving a 3 line solution :)

        I used a similar regex in one of my scripts to split a line as close to 70 chars as possible without breaking in a word). To avoid the problem you suggest, I would use this:

        unless ($string =~ s/^(.{6,12})\b\s.*/$1.../gi)
        But for such a small string it's a little rough. The context I use it in is {50,70} for line breaks, and that seems to work just fine.

        I think the *idea* is sound though, and I'm sure the user could amend .{0,12} to .{4,12} or .{7,12} depending on their specific requirements to get it to work for them.

        cLive ;-)

    Re: Smart Substrings
    by dash2 (Hermit) on Apr 03, 2001 at 04:54 UTC
      I think the method is a bit cumbersome. And you could get ugly multiple "..."s - at least, the way you have written the algorithm. Rather than continually shortening the string, why not just:

      1. Check if it is too long
      2. If not, print it
      3. If so, get the last word and as many words from the start as you can, then put a "..." in. If the last word is more than 12 chars, just get as many words from the start as you can; if the first word is more than 12 chars, truncate it.

      Short and simple. Although I think sometimes people obsess about speed... if it's building a webpage, this is not an algorithm that is gonna be run a million times. But simple code is easy to debug.

      if (length ($string) > 15) { my $last; if ($string =~ /\s(\S{1,12})$/) { $last = $1; # a suitable end word } $starting_length = 12 - length($last); if ($string =~ /(.{0,$starting_length})\s/) { # greedy pattern match +, matches multiple words $string = "$1...$last"; } else { $string = substr($string,0,12) . "..."; } }

      You could prolly do that last if.. else just with s/// and checking if the match was successful:

      ( $string =~ s/(.{0,$starting_length})\s/"$1...$last"/e ) or $string = + substr($string,0,12) . "...";

      Actually, you really want to check the first word length first. You probably want words from the start rather than the end (more recognisable), but this grabs as much as it can from the end, then cuts the start to suit. Well, play with it. dave hj~

    Re: Smart Substrings
    by $ENV{REMOTE_USER} (Novice) on Apr 04, 2001 at 00:58 UTC
      The perl DBI module provides a function for this:

      $option = DBI::neat($string, 15);
    Re: Smart Substrings
    by Rhandom (Curate) on Apr 03, 2001 at 19:35 UTC
      Well, it doesn't necessarily fit this circumstance, but if you are doing things that are greater than .. Oh.. 25 or so, you really should use Text::Wrap.
    Re: Smart Substrings
    by Anonymous Monk on Apr 03, 2001 at 23:24 UTC
      my $str = "mom went to the food market today and got me some milk"; my $maxl = 18; my $show; foreach (split(/\s+/, $str)) { if (length("$show $_")+3 >= $maxl) { $show .= '...' and last } else { $show .= " $_" } } print substr($show, 1),"\n";

      two things:
      1) if you have more than one space between words, it's gonna reduce it to one in resulting string
      2) it doesn't do it for $maxl=0 (ie outputs '..')


      vsp

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Node Status?
    node history
    Node Type: perlquestion [id://69149]
    Approved by root
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others imbibing at the Monastery: (6)
    As of 2024-04-23 19:15 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found