Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Data managing problem

by manorhce (Beadle)
on Feb 22, 2013 at 07:15 UTC ( #1020085=perlquestion: print w/ replies, xml ) Need Help??
manorhce has asked for the wisdom of the Perl Monks concerning the following question:

Hi Can you please help on this, I have data like below

1/2/2013 cgoo nreuiheru 1/4/2013 doow reiqrqueih 1/5/2013 hellio ruieqrhfuepqh 1/20/2013 cgoo 3rhquh4ureyh 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378

but I need the data like

1/4/2013 doow reiqrqueih 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378

basically I need the data with respect to second field if it is duplicate then pick the last duplicate

Please let me know if you have anything unclear

Comment on Data managing problem
Select or Download Code
Re: Data managing problem
by BrowserUk (Pope) on Feb 22, 2013 at 07:27 UTC

    That's a simple one-liner if you don't need to maintain the input ordering:

    C:\test>perl -anle"$h{ $F[1] } = $_; }{ print for values %h" 1/2/2013 cgoo nreuiheru 1/4/2013 doow reiqrqueih 1/5/2013 hellio ruieqrhfuepqh 1/20/2013 cgoo 3rhquh4ureyh 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378 ^Z 1/30/2013 yetil jerqohgqrij 1/4/2013 doow reiqrqueih 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378

    If you need to retain the ordering it is a little more complicated:

    C:\test>perl -anle"$h{ $F[1] } = sprintf qq[%05u%s], $., $_; }{ print +substr $_, 5 for sort values %h" 1/2/2013 cgoo nreuiheru 1/4/2013 doow reiqrqueih 1/5/2013 hellio ruieqrhfuepqh 1/20/2013 cgoo 3rhquh4ureyh 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378 ^Z 1/4/2013 doow reiqrqueih 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      >perl -anle"$h{ $F[1] } = $_; }{ print for values %h"

      I understood that about the implicit split and the -loctnum after reading this ( which is awesome by the way )

      >perl -anle"$h{ $F[1] } = sprintf qq[%05u%s], $., $_; }{ print substr $_, 5 for sort values %h"

      I have been searching around for a bit and I simply dont get that - could you explain that one please?

        see perlrun for most recent version

        Sure

        First the switches: perl -anle"..."

        • -e"...": a small program to run immediately
        • -n: Feed the data (from the file(s) listed on the command line; or STDIN), to teh -e snippet above on line at a time in $_

          This has the effect of placing the -e"..." within a while loop very similar to this:

          while( <> ) { ... # the -e snippet here }
        • -l: auto chomp those lines on input; add a newline to print statements on output.
        • -a: autosplit those lines on whitespace (or the argument to -F if used) and place the results in @F

        Now the -e snippet code in three parts:

        1. "$h{ $F[1] } = sprintf qq[%05u%s], $., $_;

          This builds %h with the second field of each input line as the key; and the whole line ($_) prefixed with the line number ($.) as the value.

          Later lines with matching second fields will overwrite earlier ones.

        2. }{

          This has the effect of converting the while loop shown above into:

          while( <> ) { ... # the part of the -e snippet before the }{ here; runs for ever +y line in the file. }{ # closes the while loop body and puts a bare block after it ... # The part of the -e snippet after the }{ goes here and is only + executed after the input is exhausted. }

          It a little like adding an END{} block containing the second half of the -e snippet.

        3.  print substr $_, 5 for sort values %h"

          One we've read all the input, only the last line with each second field value will remain in the hash, prefixed with the line number it came from.

          So sort the values from %h to get them into line number order and then print them out having removed the line numbers.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      perl -anle"$h{ $F[1] } = $_; }{ print for values %h"

      Dosnt work for me!

      ~$ perl -anle"$h{ $F[1] } = $_; }{ print for values %h" syntax error at -e line 1, near "} =" Execution of -e aborted due to compilation errors.
      Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
         
        Platform:
          osname=linux, osvers=3.2.0-23-generic, archname=x86_64-linux-gnu-thread-multi
          uname='linux komainu 3.2.0-23-generic #36-ubuntu smp tue apr 10 20:39:51 utc 2012 x86_64 x86_64 x86_64 gnulinux '
      
        Dosnt work for me! perl -anle"..." Platform: osname=linux,

        You need to use single quotes 's not double quotes "s on *nix command lines.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Data managing problem
by vinoth.ree (Parson) on Feb 22, 2013 at 07:29 UTC

    Use hash and save each line with the second field as key of each record. So you will get the latest record if the key get duplicated.

    use strict; use warnings; use Data::Dumper; use utf8; my %hash ; while(<DATA>) { my @arr = split(/\s+/, $_); $hash{$arr[1]}= $_; } print Dumper \%hash; __DATA__ 1/2/2013 cgoo nreuiheru 1/4/2013 doow reiqrqueih 1/5/2013 hellio ruieqrhfuepqh 1/20/2013 cgoo 3rhquh4ureyh 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378
Re: Data managing problem
by tmharish (Friar) on Feb 22, 2013 at 07:34 UTC

    This will also save it for you:

    use strict ; use warnings ; my @clean_data ; my %track_hash ; while( <DATA> ) { my $line = $_ ; chomp( $line ) ; my ( $date, $word_to_uniq, $other_word ) = split( /\s+/, $line ) ; next unless( $date and $word_to_uniq and $other_word ) ; if( defined( $track_hash{ $word_to_uniq } ) ) { delete( $clean_data[ $track_hash{ $word_to_uniq } ] ) ; } push @clean_data, $line ; $track_hash{ $word_to_uniq } = $#clean_data ; } @clean_data = map( { ( $_ ) ? $_ : () } @clean_data ) ; foreach ( @clean_data ) { print "$_\n"; } __DATA__ 1/2/2013 cgoo nreuiheru 1/4/2013 doow reiqrqueih 1/5/2013 hellio ruieqrhfuepqh 1/20/2013 cgoo 3rhquh4ureyh 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378

      And just for the fun of it:

      use strict ; use warnings ; my @clean_data ; my %track_hash ; while( <DATA> ) { s/\s+$// ; my $to_track ; ( $to_track = ( split( /\s+/ ) )[1] ) ? push( @clean_data, $_ ) : +next ; delete( $clean_data[ $track_hash{ $to_track } ] ) if( defined( $tr +ack_hash{ $to_track } ) ) ; $track_hash{ $to_track } = $#clean_data ; } @clean_data = map( { ( $_ ) ? $_ : () } @clean_data ) ; foreach ( @clean_data ) { print "$_\n"; } __DATA__ 1/2/2013 cgoo nreuiheru 1/4/2013 doow reiqrqueih 1/5/2013 hellio ruieqrhfuepqh 1/20/2013 cgoo 3rhquh4ureyh 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378
Re: Data managing problem
by manorhce (Beadle) on Feb 22, 2013 at 07:37 UTC

    Thanks a lot that worked for me, For which I like more more and more PerlMonk

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1020085]
Approved by vinoth.ree
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (13)
As of 2014-12-19 20:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (91 votes), past polls