Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^6: Splitting multiline string into words, the stuff between words, and newlines

by LanX (Saint)
on Feb 25, 2022 at 11:11 UTC ( [id://11141641]=note: print w/replies, xml ) Need Help??


in reply to Re^5: Splitting multiline string into words, the stuff between words, and newlines
in thread Splitting multiline string into words, the stuff between words, and newlines

> \b{wb} doesn't seem to take initial ones as part of words

good catch!

> my conclusion is that the only way to handle the OP problem in a way fully consistent with \w{wb} semantics is to just split using it, and maybe repack non word fragments afterwards

My intuition says split on non-words like whitespace, reject "words" without \w or equivalent characters and repack the rest afterwards.

I doubt it's possible to cover all desirable edge cases by \b{wb} this will depend on the user's perspective, especially when considering multi-language environments and unicode.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^7: Splitting multiline string into words, the stuff between words, and newlines
by salva (Canon) on Feb 25, 2022 at 11:19 UTC
    This seems to work too:
    my @fragments = $book =~ /\G(?:[^\n\w]+?\b{wb})+|.+?\b{wb}/sg;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11141641]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-06-15 01:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.