But what is to be gained by making explicitly required an action that is already implicit in the successful regex match? Everything still depends on crafting an effective validation regex.
The problem is that regexp matches are typically used to do a lot of different things, and removing malicious characters is only one of them. So assuming that a variable derived from a tainted variable through a regexp match is "clean" is dangerous.
For example, I have a fairly large code base that I wrote before I became concerned about security issues. In this code base, there are plenty of places where I capture regexp groups on user inputs for reasons that have nothing to do whatsover with removing malicious characters. For example, there are many places where I use regexps to strip out the leading and trailing characters of a user input. As a result, all those strings will be considered kosher by taint mode. In contrast, if taint mode forced me to explicitly label a variable as being untainted, those cases would be correctly identified as being currently tainted.
I'm not clutching at straws here. This is a real situation, and I am sure there are plenty of folks who have examples of this problem in their code (and I bet this includes a lot of folks who run taint mode).