Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

substr question

by Anonymous Monk
on Jun 18, 2010 at 16:10 UTC ( #845399=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I want to only show the first 100 characters or so of a string that has more than 100, but I don't want to split it in the middle of a word, so is there a way to do it instead of this:
$string = substr($string,0,100);
I have searched but not sure how to do it.

For instance if this was the string:

"Hello, I am a perl/mysql programmer, but am self taught so I do not know all the awesome features Perl has, however, I am ok at it though, I guess."

If that was the string that that code would get this: "Hello, I am a perl/mysql programmer, but am self taught so I do not know all the awesome features Pe" it would cut off the word Perl, so is there a way to do it where it does it somewhere around there that is not in the middle of a word?

Thanks.

Richard

Comment on substr question
Download Code
Re: substr question
by linuxer (Deacon) on Jun 18, 2010 at 16:50 UTC

    I modified your problem definition into:

    How to split a string into several substrings (of a given maximum length) without breaking a word when splitting.

    I found this (quickly hacked) solution:

    #! /opt/perl/bin/perl use strict; use warnings; my $text = "Hello, I am a perl/mysql programmer, but am self taught so + I do not know all the awesome features Perl has, however, I am ok at + it though, I guess."; sub split_at { my ( $text, $len ) = @_; my @result; my @parts = split /[ ]/, $text; my $short = shift @parts; while ( @parts ) { if ( length($short) + length($parts[0]) < $len ) { $short = join( ' ', $short, shift @parts ); } else { push @result, $short; $short = shift @parts; } } push @result, $short; return @result; } { local $, = local $\ = $/; print split_at( $text, 100 ); # beware print split_at( 'a'x110, 100 ); }

    Limitation: If the given string doesn't contain any whitespace (\x20), it won't split at all.

    If you just want/need the first substring, just use the first returned element.

    Updates:

    • cleaned up code
    • fixed comparison; thanks, rowdog
      if ( length($short) + length($parts[0]) <= $len ) { $short = join( ' ', $short, shift @parts ); }

      The length is off by one because you add not only the part, but a new space as well. The easy fix is to delete the = from <= $len

Re: substr question
by toolic (Chancellor) on Jun 18, 2010 at 16:52 UTC
    Something like this, where a "word" is defined by surrounding whitespace (\s):
    use strict; use warnings; my $s = "Hello, I am a perl/mysql programmer, but am self taught so I +do not know all the awesome features Perl has, however, I am ok at it + though, I guess."; my $s2; for (split /\s+/, $s) { $s2 .= "$_ "; last if length($s2) > 100; } print $s2; __END__ Hello, I am a perl/mysql programmer, but am self taught so I do not kn +ow all the awesome features Perl

      Probably better as:

      my $s2; for ( split /(\s+)/, $s ) { last if length( $s2 . $_ ) > 100; $s2 .= $_; } print $s2;
        In fact, your code is worse for 2 reasons:
        • The output string $s2 no longer has any whitespace between words: Hello,Iamaperl/mysql...
        • Generates a warning: Use of uninitialized value $s2...
Re: substr question
by BrowserUk (Pope) on Jun 18, 2010 at 17:08 UTC

    $s = "I want to only show the first 100 characters or so of a string +that has more than 100, but I don't want to split it in the middle of + a word, so is there a way to do it instead of this:";; print $string =~ m[(^.{0,100})\s];; I want to only show the first 100 characters or so of a string that ha +s more than 100, but I don't print $s =~ m[(^.{0,$_})\s] for 20 .. 50;; I want to only show I want to only show I want to only show I want to only show the I want to only show the I want to only show the I want to only show the I want to only show the I want to only show the I want to only show the first I want to only show the first I want to only show the first I want to only show the first I want to only show the first 100 I want to only show the first 100 I want to only show the first 100 I want to only show the first 100 I want to only show the first 100 I want to only show the first 100 I want to only show the first 100 I want to only show the first 100 I want to only show the first 100 I want to only show the first 100 I want to only show the first 100 I want to only show the first 100 characters I want to only show the first 100 characters I want to only show the first 100 characters I want to only show the first 100 characters or I want to only show the first 100 characters or I want to only show the first 100 characters or I want to only show the first 100 characters or so

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      That's the way I'd go too, but there's a little problem in your implementation:

      $ perl -le'print "abc def" =~ m[(^.{0,100})\s]' abc
      Fix:
      m[(^.{0,100})(?!\S)]
      m[(^.{0,100})(\s|$)]
Re: substr question
by YuckFoo (Abbot) on Jun 18, 2010 at 17:10 UTC
    ($string) = ("$string " =~ m{(.{1,100}\s)});
Re: substr question
by repellent (Priest) on Jun 18, 2010 at 17:26 UTC
    use Text::Wrap; $Text::Wrap::columns = 100; my $truncated = (split /\n/, wrap("", "", $string))[0];
Re: substr question
by Anonymous Monk on Jun 18, 2010 at 23:46 UTC
    yes, sure
Re: substr question
by Anonymous Monk on Jun 19, 2010 at 00:27 UTC
    Ah.... color me stupid, but what is wrong with: ($string) = $string =~ /^(.{100,}?)\b/;

      I don't see how that's ever useful. If the string has less than 100 characters, it returns nothing. If the string has 100 or more characters, it returns either the same as the OP's solution or too many characters. (I'm assuming 100 is a maximum, such as a screen width or a field width.)

      You've also used a different definition of "word" than everyone else such that cutting "don't" into "don" and "'t" is acceptable, and so is cutting "don't" into "don'" and "t".

        Ah... the original post said 'around 100 characters' not that that was the maximum. But no matter.

        Sigh. I suppose I did commit the cardinal sin of posting Sloppy code.

        And, I should have been clear that what I posted was NOT a turnkey solution but a suggestion that a regex approach might make sense.

        so... OK, below is the result of another few minutes fiddling, this will work better, certainly.

        my $string; $string = 'lasdufaner%.alsdfi,' x 100; # $string = 'freddy\'s wife wilma, ' x 100; my $max = 100; if ( $string and length $string > $max ){ $string = substr( $string, 0, $max); my ($tmp) = $string =~ /(.+)\s.*?$/; # last space if possible $tmp or ($tmp) = $string =~ /(.+)\W.*?$/; # bust on last non-word $tmp and $string = $tmp; print $string }


        freddy's wife output:
        freddy's wife wilma, freddy's wife wilma, freddy's wife wilma, freddy's wife wilma, freddy's wife

        lasd... output
        lasdufaner%.alsdfi,lasdufaner%.alsdfi,lasdufaner%.alsdfi,lasdufaner%.alsdfi,lasdufaner%.alsdfi

        The point being, I suppose, that this sort of thing might be easily handled by a regular expression in most cases.

        Thanks for your comment though, it's always good to have a second set of eyes. :-)

        \s

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://845399]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (10)
As of 2014-12-19 22:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (92 votes), past polls