Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Remove lines that contain matching values from csv.

by fluffyvoidwarrior (Monk)
on Oct 09, 2012 at 15:52 UTC ( [id://998031]=note: print w/replies, xml ) Need Help??


in reply to Remove lines that contain matching values from csv.

Perhaps I'm missing the point of your question but.....

If the fields are ordered as you seem to suggest and the unique id is in position 1 (so you can short circuit for speed - otherwise you'd have to regex a whole line - slower than anchoring at start or substr) can't you just treat the csv files as text files, ie a bunch of arrays. Parse each one and compare position 1 (the id field) with a cumulative output array for uniqueness. So long as the output is less than about 100,000 not-huge lines Perl should do this in a few seconds per input file (if you've optimised your code).

For obvious reasons simple textfile handling is a lot faster than using CSV libraries)

I wouldn't know how to do this with a one-liner but I don't see why you would have to. Maybe it's not the neatest of solutions but it is a guaranteed, self contained solution for half hours work, that other people will easily understand in the future.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://998031]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2024-03-19 03:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found