Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Split a paragraph based on the number of letters

by kennethk (Monsignor)
on Feb 04, 2014 at 15:19 UTC ( #1073416=note: print w/ replies, xml ) Need Help??


in reply to Split a paragraph based on the number of letters

Please read How do I post a question effectively? In particular, note that you should be providing desired output as well as some code that didn't work for you. I honestly have no idea what you mean by "splitting based on number of words, sentences or letters". If you can't write it in code, write it in pseudo-code and be explicit about your algorithm. The more specificity you can provide, the more inclined people will be to help and the better the help will be.

The general challenge you describe is not easily solved, since English is chock full of idioms and peculiarities. Given the assigned spec, I would probably split on one or more whitespace characters that are preceded by periods, question marks or exclamation points but not preceded by a title (Mr., Dr., Mrs., Ms., esq., ...). This is by no means comprehensive, but it should get you through this task. Read perlreftut and see if you can translate the above spec into a regular expression. Of particular interest should be Looking ahead and looking behind. Alternatively, you could just simply split with /\.\s+/ and then stitch entries back together if there's a trailing title.

How do you take paragraph or large amount of text and break it into sentences (perferably using Ruby)...
I think perhaps you've come to the wrong community. You should stay anyway, though, since we're pretty cool and generally helpful.

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.


Comment on Re: Split a paragraph based on the number of letters
Download Code
Re^2: Split a paragraph based on the number of letters
by Not_a_Number (Parson) on Feb 04, 2014 at 18:55 UTC

    I agree that the OP is not very clear. However, if the intention is actually to split a paragraph into sentences, I would strongly recommend using a module rather than trying to roll one's own parser.

    Here's an example using Lingua::EN::Sentence:

    use Lingua::EN::Sentence qw( get_sentences ); my $text = 'Is Mr. Hyde in? A. J. Smith Ph.D. said "Drop dead!"'; my $sentences = get_sentences($text); say for @$sentences;

    Output:

    Is Mr. Hyde in? A. J. Smith Ph.D. said "Drop dead!"

    Update: Minor wording changes; added output.

      I whole-heartedly agree. I also think the post had all the hallmarks of homework, and I suspect the professor would not accept a practical solution.


      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1073416]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2014-11-23 13:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (132 votes), past polls