Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Using Split to load a hash

by Grey Fox (Chaplain)
on Aug 19, 2006 at 13:59 UTC ( #568355=perlquestion: print w/replies, xml ) Need Help??

Grey Fox has asked for the wisdom of the Perl Monks concerning the following question:

Hello Fellow Monks
Is it possible to use the split function to load a data hash. I am trying to split a list of names and address's which are seperated by commas from a text file.
Thanks;
-- Grey Fox
Perl - Hours to learn lifetime to master.

UPDATE: I am new to perl and still trying to wrap my head around the split, hash, and map functions. I have seen the split and map functions used to load hashs. Originally I'm from a legacy cobol environment, but have had some experience with client server before. Any good references for explaining split, hash and map. Most of what I have found is pretty cryptic.
Thanks

Replies are listed 'Best First'.
Re: Using Split to load a hash
by shmem (Chancellor) on Aug 19, 2006 at 14:22 UTC
    Yes, it is possible. Did you try and if so, how? (see How do I post a question effectively?)

    update after the OP's update

    Documentation

    You get, as you might know already, documentation for split, hash and map typing e.g. perldoc -f split on the command line of your shell.

    About perl being cryptic

    The cryptiness of perl expressions will vanish as you become familiar with it's main concepts, two of them being data types (scalars, arrays, hashes and references, which are, well, scalars) and context. The latter is perl's curse and blessing and the origin of much of it's perceived cryptiness, because many functions behave different depending on the context they are invoked in. Even simple assignments do different things to the right-hand-side before the assignment is done to the left-hand-side of =, depending on what type the lvalue is.

    The text file contains just one record per line, right?

    Bilbo Baggins, Under The Hill Sam Gamgee, Bagshot Row

    in, say, a file named addrfile.txt; then I would say e.g. (TIMTOWTDI - There's More Than One Way To Do It)

    1 #!/usr/bin/perl 2 3 my $file = 'addrfile.txt'; 4 5 open I, '<', $file 6 or die "Can't open '$file' for reading: $!\n"; 7 8 chop (%hash = map { split /\s*,\s*/,$_,2 } grep (!/^$/,<I>)); 9 10 # print out that hash 11 12 print "$_ => $hash{$_}\n" for keys %hash;
    which results in
    Bilbo Baggins => Under The Hill Sam Gamgee => Bagshot Row
    and looks pretty cryptic.

    Explanation, per line (except empty ones :-)

    Line 1 tells the OS which interpreter to use.
    Line 3 assigns the file to be processed to the variable $file.
    Line 5 tries to open the file for reading associating it with the filehandle I, which on
    line 6 leaves to a program abort (die) on failure to do so.

    Line 8 is where things get interesting. It "just" contains an invocation of chop (LIST)
    chop operates on what is inside the outer round parens, which is the result of an assigment - the %hash; so chop removes line endings on the values of the hash %hash. chop is context sensitive.
    The right hand side (rhs) of the assignment inside the round parens for chop is a map statement: map BLOCK LIST. The LIST is returned by grep which operates on a LIST. So, the second argument to the grep function is evaluated in list context, which forces the <> operator (which is a funny way to say readline(FILEHANDLE)) to return a list containing the lines of addrfile.txt.
    The first argument to grep (!/^$/) says "gimme all that isn't an empty line - see perlre. This list is passed to map.
    In map each element is processed by what is contained in BLOCK ({ }), and the results of that processing (the results from the last evaluated statement) are returned as a list - which in this case are key/value pairs resulting from the split operation inside the block.
    Now, split (split /\s*,\s*/,$_,2) just splits each line as returned from grep (and assigned to $_ inside map) into two elements via the regular expression /\s*,\s*/, meaning "zero or more whitespace chars, a comma, and zero or more whitespace chars" are taken as boundary between elements.

    There. The result of map is a sequence of key/value pairs, which are assigned to a hash, which is then chopped.

    Line 12 just prints out the hash as key => value.

    The above could be written more verbosely like this:

    while (my $line = <I>) { chop $line; push(@lines,$line); } @lines = grep (!/^$/, @lines); foreach my $line(@lines) { my ($name, $addr) = split /\s*,\s*/,$_,2; $hash{$name} = $addr; }
    but in order to address most of the points in your post I showed you the "cryptic" one first :-)

    Also, if you examine this code, you will find that the contents of the hash will allocate memory twice with both my examples - first building up a list, then assigning that list to a hash. For small files that's ok, but for larger files you'd say rather

    while (my $line = <I>) { chop $line; my ($name, $addr) = split /\s*,\s*/,$_,2; $hash{$name} = $addr; }
    as shown by other replies to your post.

    --shmem

    update: small fixes (grammar and such)

    _($_=" "x(1<<5)."?\n".q/)Oo.  G\        /
                                  /\_/(q    /
    ----------------------------  \__(m.====.(_("always off the crowd"))."
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Using Split to load a hash
by rodion (Chaplain) on Aug 19, 2006 at 15:06 UTC
    I'm not sure what you have in mind. This may be the kind of example you want, of a split into a hash. If so, you can then change the "for" loop to a "while(<FILEHANDLE>)".
    my $txt=<<'TEXT_END'; nameone, addr1 info (without commas in it) nametwo, address two, city (with a comma) namethree, address three fields TEXT_END my %info; my ($name,$addr); for (split "\n",$txt) { ($name,$addr) = split ',',$_,2; $info{$name} = $addr; } my ($key,$val); while (($key,$val)= each(%info)) { print "$key->$val\n"; }
    Update: Added ",$_,2" to the split, to handle addresses with a comma, and modified test cases acordingly

      Another possible method (borrowed from "Perl for System Administrators"):

      my %info = split /,|\n/, $txt;

      This will possible fail if the second field has comma. Maybe we can work it out to prevent that :-)

      Igor 'izut' Sutton
      your code, your rules.

Re: Using Split to load a hash
by davido (Cardinal) on Aug 19, 2006 at 15:43 UTC

    Yes, you can use split for that. First, open the file. Next, iterate over its content. Chomp the line. If the remaining line is empty, next to the next line (this is just a precaution). Assuming the line isn't blank, split it on comma, assigning the results to two lexical variables. Then use one variable as the key, and one as the value in adding an element to your hash. Finally, close the file.

    Let us know which part of that task has you stumped.


    Dave

Re: Using Split to load a hash
by BrowserUk (Patriarch) on Aug 19, 2006 at 16:24 UTC

    Provided the name doesn't contain commas, then it won't matter if the address does, provided you use 3rd parameter to split. This is a number that specifies how many fields to split the input into. By setting this to 2, everything before the first comma will be treated as the first field--the name. And everything after that first comma, including more commas, will become the second field:

    #! perl -slw use strict; use Data::Dumper; ##.............................V third parameter my %hash = map{ split ',', $_, 2 } <DATA>; print Dumper \%hash; __DATA__ a name, an address, with commas, another name, and another address, also with commas and a third name, and address fourth name, fourth address

    Produces:

    C:\test>junk4 $VAR1 = { 'and a third name' => ' and address ', 'a name' => ' an address, with commas, ', 'fourth name' => ' fourth address ', 'another name' => ' and another address, also with commas ' };

    Note that the addresses are unchomped.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Using Split to load a hash
by graff (Chancellor) on Aug 19, 2006 at 16:11 UTC
    If any given name or address in the file is liable to contain a comma, and the file has quotation marks around such values, then you'll want to use one of the modules for parsing CSV (comma-separated-value) files, instead of using split: Text::CSV_XS or Text:xSV. (There's Text::CSV as well, but it's older and more limited than the others.)

    (In fact, even if your input data is really simple and has no quoted fields containing commas, you might still want to use one of those modules -- or at least look at their docs.)

Re: Using Split to load a hash
by ysth (Canon) on Aug 20, 2006 at 17:11 UTC
    Just an observation: if there's any chance you may have different people with the same name in your data (and there usually is a chance), just a simple name => address hash isn't going to be enough.
Re: Using Split to load a hash
by tanyeun (Sexton) on Aug 20, 2006 at 13:12 UTC
    I often use regex and arrays to manage this kind of data
    #!/usr/bin/perl -w open(FH, "file.csv"); # assume you have three columns of your file my (@col1, @col2, @col3); my $i = 0; while(<FH>){ if(/(\S+),(\S+),(\S+)/){ $col1[$i] = $1; $col2[$i] = $2; $col3[$i] = $3; $i++; } } print "columns:$i\n"; print "subject:\t"; for(0..2) {print "$col1[$_]\t"}; print "\n"; print "grade:\t\t"; for(0..2) {print "$col2[$_]\t"}; print "\n"; print "rank:\t\t"; for(0..2) {print "$col3[$_]\t"}; print "\n";
    maybe it's not a concise way
    but I find it useful
    any comments will be pleased^_^
      If it's useful it's useful! :-)

      Perhaps you could consider loading you data into one data structure instead of three arrays.

      You might also want a more general purpose approach where it would be easier to change the number and name of the columns.

      Putting aside the question of commas inside the fields and any other error checking one approach might be like this.

      It creates an array of hashes. If you were thinking of using something like HTML::Template for your output this would be very handy! :-)

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @AoH; my @fields = qw(subject grade rank); while (my $record = <DATA>){ chomp $record; my @values = $record =~ /([^,]+)/g; push @AoH, { map {$fields[$_] => $values[$_]} (0..$#fields) }; } print Dumper \@AoH; __DATA__ english,1,1 history,2,2 science,3,3 biology,4,4
      output:
      ---------- Capture Output ---------- > "C:\Perl\bin\perl.exe" _new.pl $VAR1 = [ { 'subject' => 'english', 'grade' => '1', 'rank' => '1' }, { 'subject' => 'history', 'grade' => '2', 'rank' => '2' }, { 'subject' => 'science', 'grade' => '3', 'rank' => '3' }, { 'subject' => 'biology', 'grade' => '4', 'rank' => '4' } ]; > Terminated with exit code 0.

      Hope that helps.

Re: Using Split to load a hash
by Anonymous Monk on Apr 27, 2010 at 12:23 UTC
    I am wondering if we can load a HOA in a similar way. I have always used the following:
    use strict; use Data::Dumper; my %hash; while(<DATA>) { chomp; my $line = $_; my $key = (split/\t/, $line)[0]; push @{ $hash{$key} }, $line; } print Dumper(\%hash); __DATA__ 1 a 101 1 b 110 2 c 201 3 d 301 3 e 310 3 f 320 4 g 401
      Following the hint in http://www.perlmonks.com/?node_id=564943,
      I tried the following with map and grep:
      use strict; use Data::Dumper; my ($hash, @array); my @array = <DATA>; $hash ={ map { chomp; my $key = (split(/\t/) )[0]; $key => [ grep { chomp; $_ if( (split(/\t/) )[0] =~/$key/) } @array ] } @array }; print Dumper(\%$hash); __DATA__ 1 2 a 1 13 w 1 20 c 2 1 b 2 40 n 3 30 a
      Note: DATA should be tab delimited for this to work

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://568355]
Approved by prasadbabu
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2022-01-28 05:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (73 votes). Check out past polls.

    Notices?