http://www.perlmonks.org?node_id=607263


in reply to split text into words -- Unicode problem (I guess)

So to look at it another way, the only thing you do want to split on is whitespace; is that right?

If so, take a look at '\s' in perlre.

HTH! andye