Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Remove lines that contain matching values from csv.

by urbs33 (Initiate)
on Oct 08, 2012 at 15:56 UTC ( #997845=perlquestion: print w/ replies, xml ) Need Help??
urbs33 has asked for the wisdom of the Perl Monks concerning the following question:

Perl Monks,

I am looking for some input on filtering a csv to unique records based on a specified field. I have a list of records that may exist on multiple servers. I am generating a list of the files, and some additional information about them, including the server that they reside on. I only want to output one record of the file, even if it is one multiple servers. Let's suppose that the field to match is field one. I will have something like this.,

unique_id,server_name,modification_date,size,another_field,another_field,

There are thousands of lines in this file. If one fo the files from this report is copied to another server, the unique_names will match, but the server name will not. I only want one record of each unique_name considered int his report and do not care from which server name. This is a unix OS, so if there is an easier way to do it with awk, sort|uniq, or other native commands, I am open to that. I'm just kinda stumped since the rest of the line will not match exactly.

Thanks!!

Comment on Remove lines that contain matching values from csv.
Re: Remove lines that contain matching values from csv.
by Anonymous Monk on Oct 08, 2012 at 16:01 UTC
Re: Remove lines that contain matching values from csv.
by BrowserUk (Pope) on Oct 08, 2012 at 16:09 UTC

    perl -F, -anle"++$uniq{$F[0]} == 1 and print" infile > outfile

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

Re: Remove lines that contain matching values from csv.
by fluffyvoidwarrior (Monk) on Oct 09, 2012 at 15:52 UTC
    Perhaps I'm missing the point of your question but.....

    If the fields are ordered as you seem to suggest and the unique id is in position 1 (so you can short circuit for speed - otherwise you'd have to regex a whole line - slower than anchoring at start or substr) can't you just treat the csv files as text files, ie a bunch of arrays. Parse each one and compare position 1 (the id field) with a cumulative output array for uniqueness. So long as the output is less than about 100,000 not-huge lines Perl should do this in a few seconds per input file (if you've optimised your code).

    For obvious reasons simple textfile handling is a lot faster than using CSV libraries)

    I wouldn't know how to do this with a one-liner but I don't see why you would have to. Maybe it's not the neatest of solutions but it is a guaranteed, self contained solution for half hours work, that other people will easily understand in the future.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://997845]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (8)
As of 2014-12-21 18:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (106 votes), past polls