http://www.perlmonks.org?node_id=504596

Tech77 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I have just started using Perl at work to help out with tasks here and there so I am new to this. Recently, I created this script to grab a list of records from a database. It actually works! I'm pretty thrilled.
open (FILE, "userstab.txt") || die "Cannot open file.\n"; open (NEWFILE, ">results.txt") || die "Cannot find or open file for ed +iting.\n"; while (<FILE>) { @zapschool = split (/\t/); if (m/zaps/i) { print NEWFILE "$zapschool[4]\n"; } } close FILE; close NEWFILE;

This gives me a list of all the entries in a column, but now I need to figure out how to use Perl to go through my new list, results.txt, and create a new list that holds the names and a count of how many times a partcular name appears on the list.

For example, my list contains college and university names. Many of them appear multiple times. I'd like to create a list where each unique record appears only once but with a frequency count of how many times it appears.

Can you offer some guidance on how to proceede? I'm not necessarily looking for finshed code, but more a point in the right direction so I can do it myself. Thank you.

Replies are listed 'Best First'.
Re: Count and List Items in a List
by japhy (Canon) on Nov 01, 2005 at 15:01 UTC
    When you think "unique" and "frequency", think of a hash. Its keys are always unique, and you can use its values as a place to store the number of times each key shows up:
    for my $word (@list) { $frequency{$word}++; }
    Now the %frequency hash holds each word (but only once!) and $frequency{$some_word} is the number of times $some_word appeared in @list.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: Count and List Items in a List
by Perl Mouse (Chaplain) on Nov 01, 2005 at 15:24 UTC
    use strict; use warnings; my %count; open my $data, "<", "userstab.txt" or die "open: $!"; open my $result, ">", "results.txt" or die "open: $!"; while (<$data>) { chomp; my $zapschool = (split /\t/)[4]; $count{$zapschool}++; print $result "$zapschool\n" if /zaps/; } close $data or die "close: $!"; close $result or die "close: $!"; while (my ($school, $count) = each %count) { printf "%s appears %d times\n", $school, $count; }
    Perl --((8:>*
Re: Count and List Items in a List
by ikegami (Patriarch) on Nov 01, 2005 at 16:09 UTC
    while (<FILE>) { @zapschool = split (/\t/); if (m/zaps/i) { print NEWFILE "$zapschool[4]\n"; } }
    is slower than
    while (<FILE>) { if (m/zaps/i) { @zapschool = split(/\t/); print NEWFILE "$zapschool[4]\n"; } }
    which can be made more readable as
    while (<FILE>) { if (m/zaps/i) { my $zapschool = (split(/\t/))[4]; print NEWFILE "$zapschool\n"; } }
    And the solution is
    my %count; while (<FILE>) { if (m/zaps/i) { my $zapschool = (split(/\t/))[4]; ++$count{$zapschool}; } } foreach (keys(%count)) { print NEWFILE "$_: $count{$_}\n"; }
      Hey, This is great! Thank you, and thanks to all the other folks who responded. I did this based on the code:
      open (FILE, "userstab.txt") || die "Cannot open file.\n"; open (NEWFILE, ">results.txt") || die "Cannot find or open file for ed +iting.\n"; my %count; while (<FILE>) { if (m/zaps/i) { my $zapschool = (split(/\t/))[4]; ++$count{$zapschool}; } } foreach (keys(%count)) { print NEWFILE "$_: $count{$_}\n"; } close NEWFILE; close FILE; print "Done!\n";
      WooHoo!
Re: Count and List Items in a List
by perlfan (Vicar) on Nov 01, 2005 at 15:36 UTC
    Assuming you get "results.txt" into an array, "@lists":
    my %counts = (); map {$counts{$_}++} @list;
    And the hash "%counts" will contain the item as a key and its count as the key's value. This is really a short version of the first post, but map is a neat function :). pF

      Except map is painfully slow in void context:

      This is perl, v5.6.1 built for MSWin32-x86-multi-thread Rate map foreach map 953/s -- -25% foreach 1278/s 34% --
      This is perl, v5.8.0 built for MSWin32-x86-multi-thread Rate map foreach map 1705/s -- -25% foreach 2288/s 34% --
      This is perl, v5.8.0 built for i386-freebsd Rate map foreach map 847/s -- -29% foreach 1190/s 40% --

      This may have been fixed since.

        This is perl, v5.8.6 built for i686-linux Rate foreach map foreach 47168/s -- -2% map 48260/s 2% --

        That said, I always prefer code that does what it says and says what it does. To me, for/foreach evaluates code foreach element in a list, while map maps (or transforms) one list into another. If you want to iterate over a list, use for or foreach. If you want to transform from one list to another, use map.

Re: Count and List Items in a List
by holli (Abbot) on Nov 01, 2005 at 16:13 UTC
    You better move that array splitting into your conditional, like so:
    if (m/zaps/i) { @zapschool = split (/\t/); print NEWFILE "$zapschool[4]\n"; }
    That will give you a performance boost because in your code all line get split, here just the neccessary lines.

    Please note (good to impress your boss :) that you can do this in a oneliner:
    C:\>perl -naF/\t/ -e "print qq($F[4]\n) if /zaps/" infile>outfile
    or the counting:
    C:\>perl -naF/\t/ -e "$hash{$F[4]}++ if /zaps/;END{print qq($_\t$hash{ +$_}\n) for sort keys %hash}" infile>outfile


    holli, /regexed monk/
Re: Count and List Items in a List
by Tech77 (Novice) on Nov 02, 2005 at 19:15 UTC
    Wow! Thanks everyone for all the responses. I'm going to try all these variations and I need to get more familiar with hashes.