Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Sorting unique values in file using perl

by PerlSavvy (Initiate)
on Oct 24, 2012 at 02:15 UTC ( #1000541=perlquestion: print w/replies, xml ) Need Help??
PerlSavvy has asked for the wisdom of the Perl Monks concerning the following question:

I have a data file with following lines:

gtssmpar11/dmunit1/mt_dm_fifo_csr_flopped_out/u_gt_ram/wrdataR_reg_0_b +171_b174_b236_b242_qreg/d2 19.33 18.65 -0.67 (VIOLATED) gtssmpar11/dmunit1/mt_dm_fifo_csr_flopped_out/u_gt_ram/wrdataR_reg_0_b +186_b196_b210_b223_qreg/d2 15.09 11.70 -3.39 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b103_b104/d2 40.35 36.61 -3.74 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b105_b106/d1 11.06 8.77 -2.29 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b105_b106/d1 40.08 29.18 -10.90 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b119_b120/d2 11.37 8.34 -3.03 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b11_b12/d2 14.56 4.41 -10.15 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b11_b12/d2 45.82 27.47 -18.34 (VIOLATED)

And I need to get the unique value and the smallest one for the fourth column with corresponding 1st column.

Output should be:

gtssmpar11/dmunit1/mt_dm_fifo_csr_flopped_out/u_gt_ram/wrdataR_reg_0_b +171_b174_b236_b242_qreg/d2 19.33 18.65 -0.67 (VIOLATED) gtssmpar11/dmunit1/mt_dm_fifo_csr_flopped_out/u_gt_ram/wrdataR_reg_0_b +186_b196_b210_b223_qreg/d2 15.09 11.70 -3.39 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b103_b104/d2 40.35 36.61 -3.74 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b105_b106/d1 40.08 29.18 -10.90 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b119_b120/d2 11.37 8.34 -3.03 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b11_b12/d2 45.82

I tried using pattern matching to get the column 1 and column 4 in hash and then sort the hash. But its not working the way i want. Please help.

Replies are listed 'Best First'.
Re: Sorting unique values in file using perl
by jwkrahn (Monsignor) on Oct 24, 2012 at 03:06 UTC
    $ echo "gtssmpar11/dmunit1/mt_dm_fifo_csr_flopped_out/u_gt_ram/wrdataR +_reg_0_b171_b174_b236_b242_qreg/d2 19.33 18.65 -0.67 (VIOLAT +ED) gtssmpar11/dmunit1/mt_dm_fifo_csr_flopped_out/u_gt_ram/wrdataR_reg_0_b +186_b196_b210_b223_qreg/d2 15.09 11.70 -3.39 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b103_b104/d2 40.35 36.61 -3.74 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b105_b106/d1 11.06 8.77 -2.29 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b105_b106/d1 40.08 29.18 -10.90 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b119_b120/d2 11.37 8.34 -3.03 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b11_b12/d2 14.56 4.41 -10.15 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b11_b12/d2 45.82 27.47 -18.34 (VIOLATED)" | \ perl -e' my %data; while ( <> ) { my ( $first, undef, undef, $fourth ) = split; if ( !exists $data{ $first } || $data{ $irst }{ fourth } > $fourth + ) { $data{ $first } = { fourth => $fourth, line => $_ }; } } for my $key ( sort keys %data ) { print $data{ $key }{ line }; } ' gtssmpar11/dmunit1/mt_dm_fifo_csr_flopped_out/u_gt_ram/wrdataR_reg_0_b +171_b174_b236_b242_qreg/d2 19.33 18.65 -0.67 (VIOLATED) gtssmpar11/dmunit1/mt_dm_fifo_csr_flopped_out/u_gt_ram/wrdataR_reg_0_b +186_b196_b210_b223_qreg/d2 15.09 11.70 -3.39 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b103_b104/d2 40.35 36.61 -3.74 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b105_b106/d1 40.08 29.18 -10.90 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b119_b120/d2 11.37 8.34 -3.03 (VIOLATED) gtssmpar21/flunit1/flunitx1/fl_flex1/fl_mt_input_fifo/u_gt_ram/wrdataR +_dreg_b11_b12/d2 45.82 27.47 -18.34 (VIOLATED)

      Hi,

      The code works fine. it is awesome.Thank you so much.

      I got to learn from this code. It is really useful.

Re: Sorting unique values in file using perl
by Kenosis (Priest) on Oct 24, 2012 at 02:31 UTC

    Hi, PerlSavvy, and welcome to PerlMonks!

    If you would, please use code tags to reformat your data as its found in your file, as that will make it more readable. Also, include any code-formatted script that's not working for you, since that will help with crafting potential solutions.

Re: Sorting unique values in file using perl
by 2teez (Priest) on Oct 24, 2012 at 02:55 UTC
      my $file1="file1.txt"; open FILE1, "<$file1" or die $!; my $file2="min_viols_endpointSorted.csv"; open FILE2, ">$file2" or die $!; while(<FILE1>){ my $path = $_; $path =~ /([^\s]+)/; $path = $1; #Extracting path chop($path); my $slack = $_; $slack =~ /[^\f+][\s+][\f+][\s+][\f+][\s+]([\f+]+)[\s](VIOLATED)/; $slack = $1; print "$slack\n"; chop($slack); print FILE2 "$path $slack\n"; }

      After this I plan to read the csv into a hash and compare the values of hash keys. If one hash value for a duplicate key is less than previous value than put it in a output file. And output should be unique, as I have shown in my question. Please help on this.

        $path =~ /([^\s]+)/; $path = $1; #Extracting path ... $slack =~ /[^\f+][\s+][\f+][\s+][\f+][\s+]([\f+]+)[\s](VIOLATED)/; $slack = $1;

        You shouldn't use the results of a regular expression unless you verify that the pattern matched or you may get erroneous results.    Also /([^\s]+)/ is usually written as /(\S+)/ and /[^\f+][\s+][\f+][\s+][\f+][\s+]([\f+]+)[\s](VIOLATED)/ matches a single character that is not a FORM FEED or '+' character, followed by a whitespace or '+' character, followed by a FORM FEED or '+' character, etc., but there are no FORM FEED characters in the string.



        chop($path); ... chop($slack);

        chop removes the last character of the string, no matter what it is.    So what is the purpose of removing the last character from $path or $slack?



Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1000541]
Approved by davido
help
Chatterbox?
erix .oO( note to self: running 100 syncing database instances on a single machine is hard on the hard disks )
[1nickt]: Oh, no, sins of the flesh were *especially* verboten at that monastery!

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2017-12-13 13:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (367 votes). Check out past polls.

    Notices?