Along the same lines, you could always simply sort the different lines into two different output files if you are worried about record keeping. For instance, once you decide how you want to filter/screen the data (probably by one of the methods discussed in the previous posts) you could put a simple if-else check to send the 'consistent' data lines to a file named "good_data.csv" and the 'inconsistent' data lines to a file named "discarded_data.csv" (or whichever names you like) so that you can review the discarded data at a later time if need be, or recover any lines that were filtered incorrectly. This also might help you find any errors or bugs in you code later on as you test it more thoroughly.
I am thinking something along these lines (pseudocode) (in case you aren't terribly familiar with syntax):
open INPUT, "name_of_input_file" or die: $!;
open GOOD_OUTPUT, ">good_data.csv" or die $!;
open BAD_OUTPUT, ">discarded_data.csv" or die $!;
while(my $line = <INPUT>){
chomp $line;
if(data_meets_good_condition){
print GOOD_OUTPUT "$line\n";
}else{
print BAD_OUTPUT "$line\n";
}
}
That's a little rudimentary (and verbiose if you are a fan of golf) and needs an appropriate logical check in the if condition, but the primary idea is the if-else check, that way you don't totally wipe data by accident. Again, in terms of filtering the data, some of the methods discussed above are probably better. |