Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Remove lines that contain matching values from csv.

by fluffyvoidwarrior (Monk)
on Oct 09, 2012 at 15:52 UTC ( #998031=note: print w/replies, xml ) Need Help??

in reply to Remove lines that contain matching values from csv.

Perhaps I'm missing the point of your question but.....

If the fields are ordered as you seem to suggest and the unique id is in position 1 (so you can short circuit for speed - otherwise you'd have to regex a whole line - slower than anchoring at start or substr) can't you just treat the csv files as text files, ie a bunch of arrays. Parse each one and compare position 1 (the id field) with a cumulative output array for uniqueness. So long as the output is less than about 100,000 not-huge lines Perl should do this in a few seconds per input file (if you've optimised your code).

For obvious reasons simple textfile handling is a lot faster than using CSV libraries)

I wouldn't know how to do this with a one-liner but I don't see why you would have to. Maybe it's not the neatest of solutions but it is a guaranteed, self contained solution for half hours work, that other people will easily understand in the future.
  • Comment on Re: Remove lines that contain matching values from csv.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://998031]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2018-05-20 13:01 GMT
Find Nodes?
    Voting Booth?