Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: Cleaning Data Between Specified Columns

by Aristotle (Chancellor)
on Jan 28, 2003 at 01:54 UTC ( #230452=note: print w/replies, xml ) Need Help??


in reply to Re: Cleaning Data Between Specified Columns
in thread Cleaning Data Between Specified Columns

Using "known nonexistant" characters is just asking for trouble.. it's a practice I've come to regard as a huge red flag. In this particular case and with Perl being Perl, the proper solution is surprising but very neat. Fletch++

Makeshifts last the longest.

  • Comment on Re^2: Cleaning Data Between Specified Columns

Replies are listed 'Best First'.
Re: Re^2: Cleaning Data Between Specified Columns
by BrowserUk (Pope) on Jan 28, 2003 at 03:05 UTC

    Sorry Aristotle. Fletch's (partial) solution, neat as the technique is, falls foul of the fact that deleting the apostrophies in a one range, causes all the subsequent columns to shift.


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      I should have tested. Anyway, in this case, it's a simple matter of changing the order of operations:
      { local *_ = \substr $source, $start, $len; y/a-zA-Z0-9\n\|-/ /c; y/'//d; }
      However, that obviously only works if there's only one operation affecting length. For a more general case, I'd do something like this (untested):
      #!/usr/bin/perl -w use strict; my @range = map /^(\d+)-(\d+)$/, sort { $a <=> $b } splice @ARGV, 1; unshift @range, 0; $range[$_] = 1 + $range[$_+1] - $range[$_] for 0 .. $#range-1; $range[-1] = '*'; die "Negative length field specified" if grep $_ < 0, @range[0 .. $#range-1]; my $fmt = join " ", map "A$_", @range; # pick odd numbered elements my @selected = map 1 + $_ * 2, 0..$range_/2; while(<>) { my @field = unpack $fmt, $_; for (@field[@selected]) { tr/a-zA-Z0-9\n\|\-'/ /c; tr/'//d; } print join '', @field; }
      The point is to structure your data whenever possible. An array element end is never ambiguous, a \x7F can happen to be, and in my case, whatever my mark character, I've always been bitten by it.

      Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://230452]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2017-12-17 04:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (462 votes). Check out past polls.

    Notices?