Also note, it would be nice if there's a way to be just as fast, but less memory intensive, as the hash may end up holding as many as 4 million items, with keys of length 1 to 32.
There is a two-pass solution to this that uses very little memory. Parse the words from each line and output them to a pipe "|sort|uniq -d". Input the results of that pipe and you'll have a list of duplicate words to save in a hash.
The second time through your file you compare the words to that hash, something like:
if (!exists $dup{$_} || $dup{$_}++ == 0) {
print it
}
If you know that STDIN is seekable (i.e. a disk file, not pipe or socket or terminal), you can seek STDIN, 0, 0 to rewind. Otherwise you'll have to write a copy of the data somewhere for your second pass.
If what you are really after is a list of the unique words in a file and you don't care about the order or line breaks, you can just parse the words out to "|sort -u".