in reply to Re-define Word Boundary?
\w matches alphanumerics and underscore. \b is effectively the same as using lookbehinds and lookaheads like this:
(?:(?<=\w)(?=\W|\z)|(?:(?<=\W)|(?<=\A))(?=\w)
Update: Hmm, or even nicer, as merlyn posted in •Re: Why do zero width assertions care about lookahead/behind? (code examples also updated),
(?:(?<!\w)(?=\w)|(?<=\w)(?!\w))
So to make a specialized version of \b that views "-" and "/" as "word characters" (sort of), you might use something like this:
(?:(?<![\w/-])(?=[\w/-])|(?<=[\w/-])(?![\w/-]))
So maybe something like this will suit you?
my $w = '\w/-'; my $b = "(?:(?<![$w])(?=[$w])|(?<=[$w])(?![$w]))"; my @words = ($rec =~ /${b}[$w]+${b}/g);
I've tested this a little but not a lot, and it seems all right. You'll want to verify it yourself before you go using it for anything important :-)
-- Mike
--
XML::Simpler does not require XML::Parser or a SAX parser.
It does require File::Slurp.
-- grantm, perldoc XML::Simpler
|
---|
In Section
Seekers of Perl Wisdom