Without seeing samples, it's a little difficult to do much for you. I'd suggest that you open them up in a binary editor which shows binary on one side and ASCII on the other. BVI and BED are Linux examples, but there are freeware and shareware ones available for That Other Nameless OS(TM) as well. What is probably happening is that some of your files are encoded in 16-bit UniCode characters (or maybe Microsoft Rich Text Format, where formatting codes are injected into the text). Unless you specifically include the appropriate modules into your program and enable the switches in your ReGex matchers, you'll have this problem.
So, first, determine what your encoding is, and then you'll be able to learn the next step or ask a more detailed question.
"There's more than one level to any answer."