Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Smoothing text input

by Anonymous Monk
on Aug 29, 2012 at 15:32 UTC ( [id://990495]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Brothers,

I would like to take arbitrary text input and manipulate it a bit. The input will be in a single scalar: $string. It might be null, empty, multiline or very large. I want to break everything into "words"; trim any leading, trailing or embedded white space; and get rid of line endings.

I have tried the following but the "map" is not doing what i expected. Is there an existing module that will do this better? What am I missing in this routine? Am I on the right track?

sub wordBreak { my ($string) = @_; my $tmp = ""; my @result = (); my @array = (); # # break the input into "lines" # @array = split(/^/m, $string); # # turn on swallow ending white space mode and # chomp the array elements # $tmp = $/; $/ = ""; chomp(@array) if ($#array >= 0); $/ = $tmp; @result = split( ' ', # split the one line into words join( ' ', # join the "lines" into one map { s/\s+/ /g; # get rid of extra whitespace s/^ //; s/ $//; } @array ) ) if ($#array >= 0); return @result; }

Replies are listed 'Best First'.
Re: Smoothing text input
by Kenosis (Priest) on Aug 29, 2012 at 16:30 UTC

    You could use Text::ParseWords:

    use Modern::Perl; use Text::ParseWords; say for shellwords 'This is just an average looking string.';

    Output:

    This is just an average looking string.

      Thanks! I found Text::ParseWords just after the initial post

        You're most welcome! It's a good find...

Re: Smoothing text input
by johngg (Canon) on Aug 29, 2012 at 17:08 UTC
    but the "map" is not doing what i expected

    That's because the result of the map that is passed out is the result of the last statement in it. You are not passing $_ out of your map, just the result (number of substitutions) of the s/ $//;. Consider the following two code snippets.

    $ perl -E ' > @arr = qw{ > abCdeFg > hIJklMn > OpqrsTu > }; > say join q{:}, @arr; > $str = join q{:}, > map { s{[A-Z]}{*}g } @arr; > say $str;' abCdeFg:hIJklMn:OpqrsTu 2:3:2 $
    $ perl -E ' > @arr = qw{ > abCdeFg > hIJklMn > OpqrsTu > }; > say join q{:}, @arr; > $str = join q{:}, > map { s{[A-Z]}{*}g; $_ } @arr; > say $str;' abCdeFg:hIJklMn:OpqrsTu ab*de*g:h**kl*n:*pqrs*u $

    Notice the different result when I pass $_ out of the map by mentioning it in a final statement.

    I hope this is helpful.

    Cheers,

    JohnGG

      Doh! Of course. Thank you!
Re: Smoothing text input
by choroba (Cardinal) on Aug 29, 2012 at 15:58 UTC
    $string =~ s/^\s+//; split /\s+/, $string;
    should work just fine. (updated).
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Smoothing text input
by hbm (Hermit) on Aug 29, 2012 at 19:08 UTC

    Just grab the non-spaces?

    sub wordBreak { my ($string) = @_; return $string =~ /\S+/g; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://990495]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (3)
As of 2024-04-25 23:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found