tilly: Kudos to the first person to figure out
what the bug is.
I'm guessing split /\W/: this splits on each
non-word character, but if there are several \Ws
together(a comma followed by a space, for instance) it
will split between them, creating a spurious "" word.
The fix was to look for \w+ (although you might
also say split /\W+/).
Update: The above split-based "solution"
introduces spurious "" words if a line (say) begins (or
ends) with a \W. Looks like m/(\w+)/g is
the Right Thing in this case.
Update 2: Of course, split discards any
empty trailing entries, so only the ones at the beginning
of the line are a problem. (I'll get this eventually...)