Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
XP is just a number
 
PerlMonks  

Writing Program With Hashes and Loops For Files With Data in Columns

by prpheart (Initiate)
on Mar 05, 2013 at 06:21 UTC ( #1021758=perlquestion: print w/ replies, xml ) Need Help??
prpheart has asked for the wisdom of the Perl Monks concerning the following question:

I am desperately thankful for any help since I am new to Perl and Linux.

I am trying to write a program to read a file called milcar, which has the following columns:

<last name> <firsr name> <integer1> <word1> <integer2> <word2> <real number>

I would like to set the key as “last_name first_name” and the value as integer2. In case there is a duplicate name I must update the value with the larger “integer2”. I must read another file that has 2 columns <last name> <first name>. I must check each of the name pairs in the 2nd file against, I guess, the hash table. Any matches must be written to user specified outfile. I must also pull all entries out of the hash and sort by first name and use the second name as tie breaker. The sorted list with 3 columns <first> <last> <integer2> will be written to user specified outfile.

I have the program that I have written so far with comments, so I appreciate any tips/suggestions. My main confusion is reading a file into a hash and if I am defining the keys properly (as I am doing so in terms of a multidimensional array, which is probably wrong).

Thank you for any help. -prpheart

#!/user/bin/perl if ($#ARGV != 3) { print "At command line, enter 2 files. First is full t +ext and second is the name text only. Third and fourth are names of o +utfiles.\n"; die "usage: entermname.pl <command line arg 1> <command line arg 2> <c +ommand line arg 3> <command line 4>\n"; } $command_line_parameter1 = $ARGV[0]; # This is file with all categories/columns described earlier. $command_line_parameter2 = $ARGV[1]; # This is file with names to check against. $command_line_parameter3 = $ARGV[2]; # This is output file for matches. $command_line_parameter4 = $ARGV[3]; # This is output file with three columns. $milcar = @milliony; open(inf,"<$milcar"); while(chomp($line = <inf>)){ ... # stuck on what to type here } my %assigned_million = ( "$milliony[$i][0] $milliony[$i][1]" => "$milliony[$i][4]"); # Am I missing "and" between names in key? # Trying to define keys in terms of two-dimensional array $hash_value = $assigned_million{$key}; if(!$hash_value or $value > $hash_value){ $assigned_million{$key} = $value; # changing integer2 } $outfile_uno = @newnamecheck; my %newnam = ( "$newnamecheck[$i][0] $newnamecheck[$i][1]" => undef); # Am I missing "and" between names in key? open(inf,"<$milcar"); while(chomp($line = <inf>)){ .......... # stuck here with loop } if($assigned_million{$key} eq $newnamecheck{$key}){ open(FH,">>$outfile_dos") or die "Cannot open outfile $outfile_dos\n"; print FH "$assigned_million{$key}\n"; close(FH); } @sorted = sort{$a->[1] cmp $b->[1] || $a->[0] cmp $b->[0]} @milliony; open(FH, ">$outfile_dos" or die "Can't write to $outfile_dos.\n"; { print FH "$sorted[$i][0], $sorted[$i][1], $sorted[$i][4]"; } close(FH);

Comment on Writing Program With Hashes and Loops For Files With Data in Columns
Download Code
Re: Writing Program With Hashes and Loops For Files With Data in Columns
by arnaud99 (Beadle) on Mar 05, 2013 at 08:55 UTC

    Hi,

    I have looked at the code and I have written a program that could be used as a starting point for what you are trying to achieve.

    This program does not cover all of what you are trying to do, it simply goes up to splitting the milcar file and loading a hash. I hope this gives you an idea of some perl's best practises (or what I would consider best practises), and help you complete your task.

    Kind regards.

    Arnaud
    use strict; # Please use these 2 pragmas use warnings; use autodie; #this one comes handy as well if (@ARGV !=4 ) { die "Ooops etc.."; } # a more descriptive name for your vars, otherwise you may as well # use $ARV[0], $ARGV[1] etc... my ($milcar_filename, $check_file, $matches_file, $three_column_file) += @ARGV; #use the 3 args version of open; open my $fh, '<', $milcar_filename; #prg will 'die' if file error due to the autodie pragm +a #store some values into a hash my %hash; while my $line(<$fh>) { chomp $line; # I assume here that the milcar file is pipe separated # but replace the separator with the separaor of your choice my ($last_name, $first_name, $integer1, $word1, $integer2, $word2, $real_number) = split /\|/, $line; #build a key, see if the key already exists, and # update with the latest int2 value, if larger. my $key = $last_name . '_' . $first_name; if (exists $hash{$key} ) { if ($integer2 > $hash{key} ) { $hash{$key} = $integer2; } } else { $hash{$key} = $integer2; } }
Re: Writing Program With Hashes and Loops For Files With Data in Columns
by ww (Bishop) on Mar 05, 2013 at 15:19 UTC

    Best help I know how to give, for now: Give us a bit more crucial information.

    • Are the data files made up of fixed length fields, CSV or TSV ... or something else?
    • Is there a data-base involved? If so, it may be that using the DBs capabilities would help
    That information would help us to help you

    If you didn't program your executable by toggling in binary, it wasn't really programming!

      Thank you for both replying. I am not using a database (that I know of). The files with the names are text files, and the columns are separated by a space. (A space is the delimiter.) I appreciate your time.

        It may be worth considering dumping the milcar file into an SQLite database, using Perl to get your data from file2, and then using the modules DBI & DBD::SQLite along with SQL's native functions to find db-matches against file2's fname, lname.

        That's not to claim I'm sufficiently expert with dbs to guarantee that this is the best approach but -- based on limited experience with (mostly) toy-applications -- it looks pretty plausible to me and DBI and DBD::SQLite will make the learning curve much shallower while the SQL capabilities can handle most of your search, sort and write requirements.


        If you didn't program your executable by toggling in binary, it wasn't really programming!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1021758]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (9)
As of 2014-04-20 18:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (486 votes), past polls