Re^2: Cleaning Data Between Specified Columns

by Aristotle (Chancellor)
on Jan 28, 2003

in reply to Re: Cleaning Data Between Specified Columns
in thread Cleaning Data Between Specified Columns

Using "known nonexistant" characters is just asking for trouble.. it's a practice I've come to regard as a huge red flag. In this particular case and with Perl being Perl, the proper solution is surprising but very neat. Fletch++

Re: Re^2: Cleaning Data Between Specified Columns
by BrowserUk on Jan 28, 2003

    Sorry Aristotle. Fletch's (partial) solution, neat as the technique is, falls foul of the fact that deleting the apostrophies in a one range, causes all the subsequent columns to shift.

      I should have tested. Anyway, in this case, it's a simple matter of changing the order of operations:
      { local *_ = \substr $source, $start, $len; y/a-zA-Z0-9\n\|-/ /c; y/'//d; }
      However, that obviously only works if there's only one operation affecting length. For a more general case, I'd do something like this (untested):
      #!/usr/bin/perl -w use strict; my @range = map /^(\d+)-(\d+)$/, sort { $a <=> $b } splice @ARGV, 1; unshift @range, 0; $range[$_] = 1 + $range[$_+1] - $range[$_] for 0 .. $#range-1; $range[-1] = '*'; die "Negative length field specified" if grep $_ < 0, @range[0 .. $#range-1]; my $fmt = join " ", map "A$_", @range; # pick odd numbered elements my @selected = map 1 + $_ * 2, 0..$range_/2; while(<>) { my @field = unpack $fmt, $_; for (@field[@selected]) { tr/a-zA-Z0-9\n\|\-'/ /c; tr/'//d; } print join '', @field; }
      The point is to structure your data whenever possible. An array element end is never ambiguous, a \x7F can happen to be, and in my case, whatever my mark character, I've always been bitten by it.

