http://www.perlmonks.org?node_id=1020090


in reply to Data managing problem

That's a simple one-liner if you don't need to maintain the input ordering:

C:\test>perl -anle"$h{ $F[1] } = $_; }{ print for values %h" 1/2/2013 cgoo nreuiheru 1/4/2013 doow reiqrqueih 1/5/2013 hellio ruieqrhfuepqh 1/20/2013 cgoo 3rhquh4ureyh 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378 ^Z 1/30/2013 yetil jerqohgqrij 1/4/2013 doow reiqrqueih 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378

If you need to retain the ordering it is a little more complicated:

C:\test>perl -anle"$h{ $F[1] } = sprintf qq[%05u%s], $., $_; }{ print +substr $_, 5 for sort values %h" 1/2/2013 cgoo nreuiheru 1/4/2013 doow reiqrqueih 1/5/2013 hellio ruieqrhfuepqh 1/20/2013 cgoo 3rhquh4ureyh 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378 ^Z 1/4/2013 doow reiqrqueih 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Data managing problem
by tmharish (Friar) on Feb 22, 2013 at 12:25 UTC
    >perl -anle"$h{ $F[1] } = $_; }{ print for values %h"

    I understood that about the implicit split and the -loctnum after reading this ( which is awesome by the way )

    >perl -anle"$h{ $F[1] } = sprintf qq[%05u%s], $., $_; }{ print substr $_, 5 for sort values %h"

    I have been searching around for a bit and I simply dont get that - could you explain that one please?

      Sure

      First the switches: perl -anle"..."

      • -e"...": a small program to run immediately
      • -n: Feed the data (from the file(s) listed on the command line; or STDIN), to teh -e snippet above on line at a time in $_

        This has the effect of placing the -e"..." within a while loop very similar to this:

        while( <> ) { ... # the -e snippet here }
      • -l: auto chomp those lines on input; add a newline to print statements on output.
      • -a: autosplit those lines on whitespace (or the argument to -F if used) and place the results in @F

      Now the -e snippet code in three parts:

      1. "$h{ $F[1] } = sprintf qq[%05u%s], $., $_;

        This builds %h with the second field of each input line as the key; and the whole line ($_) prefixed with the line number ($.) as the value.

        Later lines with matching second fields will overwrite earlier ones.

      2. }{

        This has the effect of converting the while loop shown above into:

        while( <> ) { ... # the part of the -e snippet before the }{ here; runs for ever +y line in the file. }{ # closes the while loop body and puts a bare block after it ... # The part of the -e snippet after the }{ goes here and is only + executed after the input is exhausted. }

        It a little like adding an END{} block containing the second half of the -e snippet.

      3.  print substr $_, 5 for sort values %h"

        One we've read all the input, only the last line with each second field value will remain in the hash, prefixed with the line number it came from.

        So sort the values from %h to get them into line number order and then print them out having removed the line numbers.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Thank you very much for the detailed explanation.

        Couple of follow up questions if you dont mind - Just to make sure I am understanding this right:

        1. The printf is used to ensure that should the line numbers jump from single digit to two ... this has the ability to deal with up to 5 so we can substr accordingly - is that accurate?
        2. Does the sort at the end assume that the input ( and the required output after removing rows ) is sorted or am I missing some way in which the ordering of the original input data is maintained?

      see perlrun for most recent version
        Thank you!
Re^2: Data managing problem
by tmharish (Friar) on Feb 22, 2013 at 12:47 UTC
    perl -anle"$h{ $F[1] } = $_; }{ print for values %h"

    Dosnt work for me!

    ~$ perl -anle"$h{ $F[1] } = $_; }{ print for values %h" syntax error at -e line 1, near "} =" Execution of -e aborted due to compilation errors.
    Summary of my perl5 (revision 5 version 14 subversion 2) configuration:
       
      Platform:
        osname=linux, osvers=3.2.0-23-generic, archname=x86_64-linux-gnu-thread-multi
        uname='linux komainu 3.2.0-23-generic #36-ubuntu smp tue apr 10 20:39:51 utc 2012 x86_64 x86_64 x86_64 gnulinux '
    
      Dosnt work for me! perl -anle"..." Platform: osname=linux,

      You need to use single quotes 's not double quotes "s on *nix command lines.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Works with single quotes - Thank you.