Thank you! You were right, one of the things I forgot to mention was that I was doing a substitution later to remove any punctuation characters, and it was:
I didn't realize that accented letters didn't count in the \w match, I figured they were still alphanumeric. Well that's kind of annoying. I was using that to normalize the string and remove anything like commas and semicolons. Now I have to make a list of all the characters I want to remove, instead of being able to just specify the ones I want to keep. Oh well, at least we've found the problem. Still though, is there some way to convert accented letters to just remove the accent and keep the letter? It would seem a better solution than listing out everything that is not a letter, number, space, or letter with an accent.