If the regex solution provided by
Melly isn't as fast as you need, you might try:
open( IN, '<', "Your_data_here" );
open( GOOD, '>', "Good_file_here" );
open( BAD, '>', "Bad_file_here" );
while (<IN>) {
my @row = split "\t", $_;
if ( length($row[14]) == 5 && length($row[15]) == 7 ) {
print GOOD $_;
next;
}
print BAD $_;
}
close BAD;
close GOOD;
close IN;
Note: awk processing a million rows/minute probably isn't that bad. I'm not sure Perl is going to be much faster. This is a very I/O-bound activity.
My criteria for good software:
- Does it work?
- Can someone else come in, make a change, and be reasonably certain no bugs were introduced?