Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Remove lines that contain matching values from csv.

by urbs33 (Initiate)
on Oct 08, 2012 at 15:56 UTC ( #997845=perlquestion: print w/ replies, xml ) Need Help??
urbs33 has asked for the wisdom of the Perl Monks concerning the following question:

Perl Monks,

I am looking for some input on filtering a csv to unique records based on a specified field. I have a list of records that may exist on multiple servers. I am generating a list of the files, and some additional information about them, including the server that they reside on. I only want to output one record of the file, even if it is one multiple servers. Let's suppose that the field to match is field one. I will have something like this.,

unique_id,server_name,modification_date,size,another_field,another_field,

There are thousands of lines in this file. If one fo the files from this report is copied to another server, the unique_names will match, but the server name will not. I only want one record of each unique_name considered int his report and do not care from which server name. This is a unix OS, so if there is an easier way to do it with awk, sort|uniq, or other native commands, I am open to that. I'm just kinda stumped since the rest of the line will not match exactly.

Thanks!!

Comment on Remove lines that contain matching values from csv.
Re: Remove lines that contain matching values from csv.
by Anonymous Monk on Oct 08, 2012 at 16:01 UTC
Re: Remove lines that contain matching values from csv.
by BrowserUk (Pope) on Oct 08, 2012 at 16:09 UTC

    perl -F, -anle"++$uniq{$F[0]} == 1 and print" infile > outfile

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

Re: Remove lines that contain matching values from csv.
by fluffyvoidwarrior (Monk) on Oct 09, 2012 at 15:52 UTC
    Perhaps I'm missing the point of your question but.....

    If the fields are ordered as you seem to suggest and the unique id is in position 1 (so you can short circuit for speed - otherwise you'd have to regex a whole line - slower than anchoring at start or substr) can't you just treat the csv files as text files, ie a bunch of arrays. Parse each one and compare position 1 (the id field) with a cumulative output array for uniqueness. So long as the output is less than about 100,000 not-huge lines Perl should do this in a few seconds per input file (if you've optimised your code).

    For obvious reasons simple textfile handling is a lot faster than using CSV libraries)

    I wouldn't know how to do this with a one-liner but I don't see why you would have to. Maybe it's not the neatest of solutions but it is a guaranteed, self contained solution for half hours work, that other people will easily understand in the future.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://997845]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2014-09-01 19:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (17 votes), past polls