http://www.perlmonks.org?node_id=1073416


in reply to Split a paragraph based on the number of letters

Please read How do I post a question effectively? In particular, note that you should be providing desired output as well as some code that didn't work for you. I honestly have no idea what you mean by "splitting based on number of words, sentences or letters". If you can't write it in code, write it in pseudo-code and be explicit about your algorithm. The more specificity you can provide, the more inclined people will be to help and the better the help will be.

The general challenge you describe is not easily solved, since English is chock full of idioms and peculiarities. Given the assigned spec, I would probably split on one or more whitespace characters that are preceded by periods, question marks or exclamation points but not preceded by a title (Mr., Dr., Mrs., Ms., esq., ...). This is by no means comprehensive, but it should get you through this task. Read perlreftut and see if you can translate the above spec into a regular expression. Of particular interest should be Looking ahead and looking behind. Alternatively, you could just simply split with /\.\s+/ and then stitch entries back together if there's a trailing title.

How do you take paragraph or large amount of text and break it into sentences (perferably using Ruby)...
I think perhaps you've come to the wrong community. You should stay anyway, though, since we're pretty cool and generally helpful.

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.