Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Data managing problem

by tmharish (Friar)
on Feb 22, 2013 at 12:25 UTC ( #1020148=note: print w/ replies, xml ) Need Help??


in reply to Re: Data managing problem
in thread Data managing problem

>perl -anle"$h{ $F[1] } = $_; }{ print for values %h"

I understood that about the implicit split and the -loctnum after reading this ( which is awesome by the way )

>perl -anle"$h{ $F[1] } = sprintf qq[%05u%s], $., $_; }{ print substr $_, 5 for sort values %h"

I have been searching around for a bit and I simply dont get that - could you explain that one please?


Comment on Re^2: Data managing problem
Select or Download Code
Re^3: Data managing problem
by Anonymous Monk on Feb 22, 2013 at 12:31 UTC
    see perlrun for most recent version
      Thank you!
Re^3: Data managing problem
by BrowserUk (Pope) on Feb 22, 2013 at 13:22 UTC

    Sure

    First the switches: perl -anle"..."

    • -e"...": a small program to run immediately
    • -n: Feed the data (from the file(s) listed on the command line; or STDIN), to teh -e snippet above on line at a time in $_

      This has the effect of placing the -e"..." within a while loop very similar to this:

      while( <> ) { ... # the -e snippet here }
    • -l: auto chomp those lines on input; add a newline to print statements on output.
    • -a: autosplit those lines on whitespace (or the argument to -F if used) and place the results in @F

    Now the -e snippet code in three parts:

    1. "$h{ $F[1] } = sprintf qq[%05u%s], $., $_;

      This builds %h with the second field of each input line as the key; and the whole line ($_) prefixed with the line number ($.) as the value.

      Later lines with matching second fields will overwrite earlier ones.

    2. }{

      This has the effect of converting the while loop shown above into:

      while( <> ) { ... # the part of the -e snippet before the }{ here; runs for ever +y line in the file. }{ # closes the while loop body and puts a bare block after it ... # The part of the -e snippet after the }{ goes here and is only + executed after the input is exhausted. }

      It a little like adding an END{} block containing the second half of the -e snippet.

    3.  print substr $_, 5 for sort values %h"

      One we've read all the input, only the last line with each second field value will remain in the hash, prefixed with the line number it came from.

      So sort the values from %h to get them into line number order and then print them out having removed the line numbers.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thank you very much for the detailed explanation.

      Couple of follow up questions if you dont mind - Just to make sure I am understanding this right:

      1. The printf is used to ensure that should the line numbers jump from single digit to two ... this has the ability to deal with up to 5 so we can substr accordingly - is that accurate?
      2. Does the sort at the end assume that the input ( and the required output after removing rows ) is sorted or am I missing some way in which the ordering of the original input data is maintained?

        The printf is used to ensure that should the line numbers jump from single digit to two

        The sprintf pads the lines numbers with leading zeros so the sort correctly. Ie so that they do not sort as:

        1 10 11 ... 2 20 21 22 ... 3 30 31 32 33

        %05u means we can correctly sort files with upto 99999 lines. If you need more change the number in the format.

        Does the sort at the end assume that the input ( and the required output after removing rows ) is sorted

        Why would sort "assume its input was sorted"? We are sorting them because we know they will not be. That's why we added the line numbers so that we can put the, back into the input ordering. Perhaps the following where I've left the line numbers in place will clarify things?

        C:\test>perl -anle"$h{ $F[1] } = sprintf qq[%05u%s], $., $_; }{ print +for sort values %h" 1/2/2013 cgoo nreuiheru 1/4/2013 doow reiqrqueih 1/5/2013 hellio ruieqrhfuepqh 1/20/2013 cgoo 3rhquh4ureyh 1/30/2013 yetil jerqohgqrij 2/13/2013 hellio rueqipheruh 2/14/2013 cgoo wehrig4r74378 ^Z 000021/4/2013 doow reiqrqueih 000051/30/2013 yetil jerqohgqrij 000062/13/2013 hellio rueqipheruh 000072/14/2013 cgoo wehrig4r74378

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1020148]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2014-09-01 21:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (17 votes), past polls