Grey Fox has asked for the wisdom of the Perl Monks concerning the following question:
Hello Fellow Monks
Is it possible to use the split function to load a data hash. I am trying to split a list of names and address's which are seperated by commas from a text file.
I am new to perl and still trying to wrap my head around the split, hash, and map functions. I have seen the split and map functions used to load hashs. Originally I'm from a legacy cobol environment, but have had some experience with client server before. Any good references for explaining split, hash and map. Most of what I have found is pretty cryptic.
Re: Using Split to load a hash
by shmem (Chancellor) on Aug 19, 2006 at 14:22 UTC
Yes, it is possible. Did you try and if so, how? (see How do I post a question effectively?)
update after the OP's update
You get, as you might know already, documentation for split, hash and map typing e.g. perldoc -f split on the command line of your shell.
About perl being cryptic
The cryptiness of perl expressions will vanish as you become familiar with it's main concepts, two of them being data types (scalars, arrays, hashes and references, which are, well, scalars) and context. The latter is perl's curse and blessing and the origin of much of it's perceived cryptiness, because many functions behave different depending on the context they are invoked in. Even simple assignments do different things to the right-hand-side before the assignment is done to the left-hand-side of =, depending on what type the lvalue is.
The text file contains just one record per line, right?
Bilbo Baggins, Under The Hill
Sam Gamgee, Bagshot Row
in, say, a file named addrfile.txt; then I would say e.g. (TIMTOWTDI - There's More Than One Way To Do It)
1 #!/usr/bin/perl
3 my $file = 'addrfile.txt';
5 open I, '<', $file
6 or die "Can't open '$file' for reading: $!\n";
8 chop (%hash = map { split /\s*,\s*/,$_,2 } grep (!/^$/,<I>));
10 # print out that hash
12 print "$_ => $hash{$_}\n" for keys %hash;
which results in
Bilbo Baggins => Under The Hill
Sam Gamgee => Bagshot Row
and looks pretty cryptic.
Explanation, per line (except empty ones :-)
Line 1 tells the OS which interpreter to use.
Line 3 assigns the file to be processed to the variable $file.
Line 5 tries to open the file for reading associating it with the filehandle I, which on
line 6 leaves to a program abort (die) on failure to do so.
Line 8 is where things get interesting. It "just" contains an invocation of chop (LIST)
chop operates on what is inside the outer round parens, which is the result of an assigment - the %hash; so chop removes line endings on the values of the hash %hash. chop is context sensitive.
The right hand side (rhs) of the assignment inside the round parens for chop is a map statement: map BLOCK LIST. The LIST is returned by grep which operates on a LIST. So, the second argument to the grep function is evaluated in list context, which forces the <> operator (which is a funny way to say readline(FILEHANDLE)) to return a list containing the lines of addrfile.txt.
The first argument to grep (!/^$/) says "gimme all that isn't an empty line - see perlre. This list is passed to map.
In map each element is processed by what is contained in BLOCK ({ }), and the results of that processing (the results from the last evaluated statement) are returned as a list - which in this case are key/value pairs resulting from the split operation inside the block.
Now, split (split /\s*,\s*/,$_,2) just splits each line as returned from grep (and assigned to $_ inside map) into two elements via the regular expression /\s*,\s*/, meaning "zero or more whitespace chars, a comma, and zero or more whitespace chars" are taken as boundary between elements.
There. The result of map is a sequence of key/value pairs, which are assigned to a hash, which is then chopped.
Line 12 just prints out the hash as key => value.
The above could be written more verbosely like this:
while (my $line = <I>) {
chop $line;
@lines = grep (!/^$/, @lines);
foreach my $line(@lines) {
my ($name, $addr) = split /\s*,\s*/,$_,2;
$hash{$name} = $addr;
but in order to address most of the points in your post I showed you the "cryptic" one first :-)
Also, if you examine this code, you will find that the contents of the hash will allocate memory twice with both my examples - first building up a list, then assigning that list to a hash. For small files that's ok, but for larger files you'd say rather
while (my $line = <I>) {
chop $line;
my ($name, $addr) = split /\s*,\s*/,$_,2;
$hash{$name} = $addr;
as shown by other replies to your post.
update: small fixes (grammar and such)
Re: Using Split to load a hash
by rodion (Chaplain) on Aug 19, 2006 at 15:06 UTC
I'm not sure what you have in mind. This may be the kind of example you want, of a split into a hash. If so, you can then change the "for" loop to a "while(<FILEHANDLE>)".
my $txt=<<'TEXT_END';
nameone, addr1 info (without commas in it)
nametwo, address two, city (with a comma)
namethree, address three fields
my %info;
my ($name,$addr);
for (split "\n",$txt) {
($name,$addr) = split ',',$_,2;
$info{$name} = $addr;
my ($key,$val);
while (($key,$val)= each(%info)) {
print "$key->$val\n";
Update: Added ",$_,2" to the split, to handle addresses with a comma, and modified test cases acordingly
my %info = split /,|\n/, $txt;
This will possible fail if the second field has comma. Maybe we can work it out to prevent that :-)
Re: Using Split to load a hash
by davido (Cardinal) on Aug 19, 2006 at 15:43 UTC
Yes, you can use split for that. First, open the file. Next, iterate over its content. Chomp the line. If the remaining line is empty, next to the next line (this is just a precaution). Assuming the line isn't blank, split it on comma, assigning the results to two lexical variables. Then use one variable as the key, and one as the value in adding an element to your hash. Finally, close the file.
Let us know which part of that task has you stumped.
Re: Using Split to load a hash
by BrowserUk (Patriarch) on Aug 19, 2006 at 16:24 UTC
Provided the name doesn't contain commas, then it won't matter if the address does, provided you use 3rd parameter to split. This is a number that specifies how many fields to split the input into. By setting this to 2, everything before the first comma will be treated as the first field--the name. And everything after that first comma, including more commas, will become the second field:
#! perl -slw
use strict;
use Data::Dumper;
##.............................V third parameter
my %hash = map{ split ',', $_, 2 } <DATA>;
print Dumper \%hash;
a name, an address, with commas,
another name, and another address, also with commas
and a third name, and address
fourth name, fourth address
Produces: C:\test>junk4
$VAR1 = {
'and a third name' => ' and address
'a name' => ' an address, with commas,
'fourth name' => ' fourth address
'another name' => ' and another address, also with commas
Note that the addresses are unchomped.
Re: Using Split to load a hash
by graff (Chancellor) on Aug 19, 2006 at 16:11 UTC
If any given name or address in the file is liable to contain a comma, and the file has quotation marks around such values, then you'll want to use one of the modules for parsing CSV (comma-separated-value) files, instead of using split: Text::CSV_XS or Text:xSV. (There's Text::CSV as well, but it's older and more limited than the others.)
Re: Using Split to load a hash
by ysth (Canon) on Aug 20, 2006 at 17:11 UTC
Re: Using Split to load a hash
by tanyeun (Sexton) on Aug 20, 2006 at 13:12 UTC
I often use regex and arrays to manage this kind of data
#!/usr/bin/perl -w
open(FH, "file.csv");
# assume you have three columns of your file
my (@col1, @col2, @col3);
my $i = 0;
$col1[$i] = $1;
$col2[$i] = $2;
$col3[$i] = $3;
print "columns:$i\n";
print "subject:\t";
for(0..2) {print "$col1[$_]\t"};
print "\n";
print "grade:\t\t";
for(0..2) {print "$col2[$_]\t"};
print "\n";
print "rank:\t\t";
for(0..2) {print "$col3[$_]\t"};
print "\n";
If it's useful it's useful! :-)
Perhaps you could consider loading you data into one data structure instead of three arrays.
You might also want a more general purpose approach where it would be easier to change the number and name of the columns.
Putting aside the question of commas inside the fields and any other error checking one approach might be like this.
It creates an array of hashes. If you were thinking of using something like HTML::Template for your output this would be very handy! :-)
use strict;
use warnings;
use Data::Dumper;
my @AoH;
my @fields = qw(subject grade rank);
while (my $record = <DATA>){
chomp $record;
my @values = $record =~ /([^,]+)/g;
push @AoH, {
map {$fields[$_] => $values[$_]} (0..$#fields)
print Dumper \@AoH;
$VAR1 = [
'subject' => 'english',
'grade' => '1',
'rank' => '1'
'subject' => 'history',
'grade' => '2',
'rank' => '2'
'subject' => 'science',
'grade' => '3',
'rank' => '3'
'subject' => 'biology',
'grade' => '4',
'rank' => '4'
Re: Using Split to load a hash
by Anonymous Monk on Apr 27, 2010 at 12:23 UTC
I am wondering if we can load a HOA in a similar way. I have always used the following:
use strict;
use Data::Dumper;
my %hash;
my $line = $_;
my $key = (split/\t/, $line)[0];
push @{ $hash{$key} }, $line;
print Dumper(\%hash);
1 a 101
1 b 110
2 c 201
3 d 301
3 e 310
3 f 320
4 g 401
use strict;
use Data::Dumper;
my ($hash, @array);
my @array = <DATA>;
$hash ={ map
{ chomp; my $key = (split(/\t/) )[0];
$key => [
grep {
chomp; $_
if( (split(/\t/) )[0] =~/$key/)
} @array
} @array
print Dumper(\%$hash);
1 2 a
1 13 w
1 20 c
2 1 b
2 40 n
3 30 a
