If your data is really that mangled, taking X characters from the front (e.g. with substr) isn't going to help. If, say, you take the first 8 characters so that "Group On" always goes into "Group One", what will you do with "Group Tw" (Two, Twelve, Twenty?).
I suspect you're going to have to visually analyse your data and come up some sort of lookup table. Work through as much of your data as you can programmatically; outputting what can't be processed to a separate file. Then, based on what's left, either extend the lookup table or edit manually.