in reply to Filtering out stop words

As I commented over in Building Regex Alternations Dynamically, it's possible to generate a regex from the entirety of /usr/share/dict/words, which on my system currently has over 100,000 entries, resulting in a regex that has a string length of 1MB. Matching against that regex is still relatively performant. So building a regex in the way you showed is possible; whether it's the best solution in your case probably depends on how many matches you'll be doing with that regex, and you'll have to measure the performance in your use case. I would recommend that loadCommonWords should return a regex precompiled with qr// instead of a string, and that you sort @commonwords by length, as I showed in the aforementioned thread.

Update: Eily is right, I overlooked the anchors: for exact string matches, definitely use a hash instead.