by TGI (Parson)
on Oct 29, 2008 at 17:14 UTC

in reply to Re: scalable chomping
in thread scalable chomping

If X Y and Z can legitimately be in the file you are going to have to do more work. Keep track of values that you have "fixed" substitutions in, and what the original character was. You will then have a list of 'known suspect values' as well as a way to get the original value.

The best approach (short of retrieval from a backup) would be to do as much parsing and sanity checking on the data as you process the file. Trivial/Obvious fixes can be automated, but anything questionable needs to be flagged and ask for human intervention.

Good luck. I think you'll need it :/.

TGI says moo

Node Type: note [id://720263]
