Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

for each unique value in a column find the max value..need perl script

by qmenon (Initiate)
on May 31, 2011 at 15:02 UTC ( #907474=perlquestion: print w/ replies, xml ) Need Help??
qmenon has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Below is my Input file:

date mtime no size id day order

20100607 154538.354300 200 1 101510 14098703993

20100607 154539.420000 200 1 101511 14098703993

20100607 154538.398200 487 1 100888 14098703994

20100607 154610.720000 487 1 91588 14098703994

20100607 154538.401200 200 1 101502 14098703995

20100607 154539.420000 200 1 101500 14098703995

I need a perl code to get the foll o/p. For each unique order field, calculate the oldest mtime; and for each unique order field replace the day field by the oldest mtime. For example, the 1st 2 rows have the same order field=14098703993, for which the oldest mtime is 154538.354300 and hence we put this value into the day field. The output as below: Output

date mtime no size id day order

20100607 154538.354300 200 1 154538.354300 14098703993

20100607 154539.420000 200 1 154538.354300 14098703993

20100607 154538.398200 487 1 154538.398200 14098703994

20100607 154610.720000 487 1 154538.398200 14098703994

20100607 154538.401200 200 1 154538.401200 14098703995

20100607 154539.420000 200 1 154538.401200 14098703995

Please help me guys with the code.. I have been trying for a long time now..! Thanks,

Comment on for each unique value in a column find the max value..need perl script
Re: for each unique value in a column find the max value..need perl script
by Fletch (Chancellor) on May 31, 2011 at 15:04 UTC

    Since you've been trying for a long time you certainly can show the code you've written in your attempts so far so that people here can point out where your problems are.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: for each unique value in a column find the max value..need perl script
by kennethk (Monsignor) on May 31, 2011 at 15:07 UTC
    You've "been trying for a long time now", so what did you try and what didn't work? This community is happy to help debug, but we are not a code-writing service.

    I would in particular point out Text::CSV for parsing the file and Relational Operators and Foreach Loops for processing.

Re: for each unique value in a column find the max value..need perl script
by davido (Archbishop) on May 31, 2011 at 16:04 UTC

    Use List::Utils. Row by row, split each field by space. Use the 'order' column as a hash key, and the rest of the columns get put into an anonymous array and pushed onto your hash for that key (use a HoAoA so that you may have multiple entries per key).

    Next, iterate over the keys. For each hash entry, pull a list of mtimes out of the AoA portion of the datastructure. get a max() of those values. Then replace the mtime column in the AoA with the value that max() returned.

    Now move your original file (rename) to filename.bak (for example). Then open a new file for output with the original file's name, and write your structure back out again in the intended format.

    This solution does hold the entire file in memory, so it wouldn't scale well to huge files. But if you were dealing with truly huge data sets you would already have a database, and updates would be as simple as an SQL statement.

    If you have a question on part of the implementation, be specific as to which part eludes you, and we'll try to help.


    Dave

      Thanks all for your reply

      reformatted the date as below:

      order mtime no size id day date

      14098703993 154538.354300 200 1 101510

      14098703993 154539.420000 200 1 101511

      14098703994 154538.398200 487 1 100888

      14098703994 154610.720000 487 1 91588

      14098703995 154538.401200 200 1 101502

      14098703995 154539.420000 200 1 101500

      use List::Util qw(max min); my %id_hash; open (DATA, ".txt"); while (<DATA>) { chomp; my ($order, $mtime, $size, $id, $date) = split /\t/; push @{ $id_hash{$order}{$id}{mtime} }, $mtime; push @{ $id_hash{$order}{$id}{size } }, $size; push @{ $id_hash{$order}{$id}{date } }, $date; } open (OUT, ">output.txt"); for my $order (keys %id_hash) { for my $id (keys %{ $id_hash{$order} }) { my $Low = min( @ { $id_hash{$order}{$id}{mtime} } ); print OUT "$order $Low \n"; } }

      Now the problem is this does not give duplicate order entires! I think I am unable to do the below: I want the duplicate order values as it is. Unable to replace the oldest mtime into the date field.

      So now I am doing the below stupid code which will be the loooongest code of my life...:

      open (OUT, ">output.txt"); open (IN, "input1.txt");->original file1 with all data while($line=<IN>){ chomp($line); ($Date,$MTime,$inserdate,$inserttime,$Id,$Phase,$Size,$day,$order) += split(/ /,$line); open (INL, "file2.txt");-->contains the sorted order values of order v +alues($x) from file 1 while($linel=<INL>){ chomp($linel); ($x,$y,$z,$q,$t)= split(/ /,$linel); if($x == $order) { #print OUT "$a $x $b $z $q $t \n"; print OUT $Date," ",$M_Time," ",$inserdate," ",$y," ",$Id," ",$Pha +se," ",$Size," ",$day," ",$order,"\n"; } } } close(INL); close(IN); close(OUT); print "DONE";
      Hope it makes sense to you.. am not an expert in Perl.. just trial and error guys.. but now really need ur help...plzzz!

        A solution: mind you, I wouldn't use this on a very big file. Not exactly the method davido was describing, but it will do.

        my %id_hash; my @lines = (); open (DATA, "test.txt"); while ($line = <DATA>) { chomp $line; my @line = split /\t/ , $line; push @lines, \@line; # push the original line in an array as an an +onymous array if ($line[1] > $id_hash{$line[0]}) {$id_hash{$line[0]} = $line[1]; +} # calculate the biggest mtime for a given order id } open (OUT, ">output.txt"); foreach my $item (@lines) { my @line = @{$item}; # get the original line $line[4] = $id_hash{$line[0]}; # replace the fifth element with th +e calculated maximum print OUT join "\t" ,@line, "\n"; # print the adapted line }

        Greetings

        Martell

        That's a bit hard to follow. I count seven captions (headers),
        order mtime no size id day date,
        but even hypothesizing that the dot in the second field,
        14098703993 154538.354300 200 1 101510,
        denotes a field break, I can only find six fields.

        7==6 does not compute.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://907474]
Approved by kennethk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (20)
As of 2014-10-24 14:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (132 votes), past polls