Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re^4: stripped punctuation

by thealienz1 (Pilgrim)
on Oct 06, 2005 at 21:37 UTC ( #498064=note: print w/replies, xml ) Need Help??

in reply to Re^3: stripped punctuation
in thread stripped punctuation

I did basically your second regexp there in two steps. I will try the yours, though. I am curious the difference in speed between them. Of course I am wondering what you mean by \w doesn't mean what I think I mean.

Replies are listed 'Best First'.
Re^5: stripped punctuation
by fishbot_v2 (Chaplain) on Oct 06, 2005 at 21:45 UTC

    from perlre:

    A "\w" matches a single alphanumeric character (an alphabetic character, or a decimal digit) or "_"...

    Thus your earlier use of [^\w\d] had the set of digits in it twice, which suggested to me that you thought that \w means [A-Za-z].

    [^\w\d] works, but is redundant and equivalent to \W

    Update: You asked what the speed difference between the two passes and one pass:

    s/(?:^\W+)|(?:\W+$)//g; # versus s/\W+$//g; s/^\W+//g; # my unscientific benchmark Rate single_pass two_pass single_pass 15829/s -- -11% two_pass 17737/s 12% --

    Doing it in two passes seems to be about 10-15% faster.

Re^5: stripped punctuation
by Nkuvu (Priest) on Oct 06, 2005 at 21:50 UTC
    The \w means "any alphanumeric character or underscore." So in your regex, where you have [^\w\d] it's a bit redundant. \w can be replaced by [a-zA-Z0-9_] so you're writing [a-zA-Z0-9_0-9] in the regexen above.

    Also note that since \w includes the underscore you're matching more than what you say you want.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://498064]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2019-10-16 19:31 GMT
Find Nodes?
    Voting Booth?