Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Splitting multiline string into words, the stuff between words, and newlines

by salva (Canon)
on Feb 24, 2022 at 09:06 UTC ( [id://11141609]=note: print w/replies, xml ) Need Help??


in reply to Splitting multiline string into words, the stuff between words, and newlines

You can also use split for that in order to not require a regular expression for matching non words:
my @fragments = grep length, split /(\b{wb}.+?\b{wb}|\n+)/, $book;
So, you get words, sequences of new lines and then everything else.
  • Comment on Re: Splitting multiline string into words, the stuff between words, and newlines
  • Download Code

Replies are listed 'Best First'.
Re^2: Splitting multiline string into words, the stuff between words, and newlines
by ibm1620 (Hermit) on Feb 24, 2022 at 12:50 UTC
    This looks to me like it should work, but it splits the strings of non-words into separate characters!

    "For example ...\n" -> {For}{_}{example}{_}{.}{.}{.}{$}
      That is because \b{wb} matches between those signs.

      This seems to solve the issue:

      my @fragments = grep length, split /(\b{wb}\w.*?\b{wb}|\n+)/, $book;

      But my knowledge of Unicode and the \b{wb} semantics is rather limited so that may have other issues.

        Not sure 'cause that's 'bout words also including non \w characters.

        And some of 'em even start on apostrophe ;)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        For my purposes, this is fine. I'm mainly interested in capturing possessives and contractions.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11141609]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-05-22 20:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found