Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re^2: Cleaning Data Between Specified Columns

by Aristotle (Chancellor)
on Jan 28, 2003 at 01:54 UTC ( #230452=note: print w/replies, xml ) Need Help??

in reply to Re: Cleaning Data Between Specified Columns
in thread Cleaning Data Between Specified Columns

Using "known nonexistant" characters is just asking for trouble.. it's a practice I've come to regard as a huge red flag. In this particular case and with Perl being Perl, the proper solution is surprising but very neat. Fletch++

Makeshifts last the longest.

  • Comment on Re^2: Cleaning Data Between Specified Columns

Replies are listed 'Best First'.
Re: Re^2: Cleaning Data Between Specified Columns
by BrowserUk (Pope) on Jan 28, 2003 at 03:05 UTC

    Sorry Aristotle. Fletch's (partial) solution, neat as the technique is, falls foul of the fact that deleting the apostrophies in a one range, causes all the subsequent columns to shift.

    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      I should have tested. Anyway, in this case, it's a simple matter of changing the order of operations:
      { local *_ = \substr $source, $start, $len; y/a-zA-Z0-9\n\|-/ /c; y/'//d; }
      However, that obviously only works if there's only one operation affecting length. For a more general case, I'd do something like this (untested):
      #!/usr/bin/perl -w use strict; my @range = map /^(\d+)-(\d+)$/, sort { $a <=> $b } splice @ARGV, 1; unshift @range, 0; $range[$_] = 1 + $range[$_+1] - $range[$_] for 0 .. $#range-1; $range[-1] = '*'; die "Negative length field specified" if grep $_ < 0, @range[0 .. $#range-1]; my $fmt = join " ", map "A$_", @range; # pick odd numbered elements my @selected = map 1 + $_ * 2, 0..$range_/2; while(<>) { my @field = unpack $fmt, $_; for (@field[@selected]) { tr/a-zA-Z0-9\n\|\-'/ /c; tr/'//d; } print join '', @field; }
      The point is to structure your data whenever possible. An array element end is never ambiguous, a \x7F can happen to be, and in my case, whatever my mark character, I've always been bitten by it.

      Makeshifts last the longest.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://230452]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2018-09-22 11:44 GMT
Find Nodes?
    Voting Booth?
    Eventually, "covfefe" will come to mean:

    Results (190 votes). Check out past polls.

    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!