Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
Problems? Is your data what you think it is?
 
PerlMonks  

newbie hasher

by prodevel (Scribe)
on Nov 20, 2003 at 00:07 UTC ( [id://308499]=perlquestion: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.

prodevel has asked for the wisdom of the Perl Monks concerning the following question:

I've been hacking/man-ing/searching for a couple of hours now and I'm getting a bit tired. I was curious as to an elegant way to read a file into a hash.

Take a hosts file for example...

host1 1.1.1.1
host2 1.2.3.5

I just want to assign this to a hash for further processing, e.g.

while (($host,$ip) = each(%hosts))

I normally figure this stuff out in different ways, but I'd like to use a hash for this.

Thanks!

Replies are listed 'Best First'.
Re: newbie hasher
by Zaxo (Archbishop) on Nov 20, 2003 at 00:18 UTC

    Probably the easiest way is to split each line and explicitly add the pair to the hash,

    my %hosts; { local $_; open my $fh, '<', '/path/to/data.file' or die $!; while (<$fh>) { my @pair = split; $hosts{ $pair[0] } = $pair[1]; } close $fh or die $!; }

    After Compline,
    Zaxo

      Though I'm responding to Zaxo, my comment applies to most of the answers I see. With code doing a split, check that $pair[1] aka $ip is actually getting set to a defined value. Otherwise, your hash value may be undef on bad input but you won't get a warning about it until you actually try to use the hash value later on. Best to validate as you go:
      while (<$fh>) { chomp; my ($host, $ip) = split; warn("bad hosts line: $_"),next if !defined $ip; $hosts{$host} = $ip; }
      Note that if you check defined($ip) the chomp is necessary, since otherwise $ip set to the empty string (taken from the empty string following the newline character). If you use a regex to split up the line instead, make sure it actually requires both fields and check if the match succeeds (which gets a little funky if you are using map):
      %hosts = map { if (/^(\w+)\s(.+)/) { ($1 => $2) } else { (warn "bad hosts line: $_")[1..0]; } } <FILE>;
Re: newbie hasher
by Anonymous Monk on Nov 20, 2003 at 00:19 UTC
    If your lines are simply host and IP separated by a space, then it's pretty simple:
    open my $fh, "<input.dat" or die "Couldn't open file: $!"; my %hosts; while (<$fh>) { chomp; my ($host, $ip) = split; $hosts{$host} = $ip; } use Data::Dumper; print Dumper \%hosts;
Re: newbie hasher
by etcshadow (Priest) on Nov 20, 2003 at 00:24 UTC
    # assuming you've already opened your file... while (my $line = <FILE>) { chomp $line; my ($host,$ip) = split(/\s+/,$line,2); $hosts{$host} = $ip; }

    ------------
    :Wq
    Not an editor command: Wq
      Close to perfect, IMO, except that I'd add a test for empty lines, like this:
      my ($host,$ip) = split(/\s+/,$line,2) or next;
Re: newbie hasher
by Roger (Parson) on Nov 20, 2003 at 00:32 UTC
    Or you could write a one-liner to transform the input file into a hash.

    Method 1 - (map with 'short-circuit', now handles empty lines)
    use strict; use Data::Dumper; # Uh, well spotted, typo fixed with the extra + # which would have no effect anyway. :-) # ysth suggested that putting () in map shortcircuts # empty lines. It worked. Thanks. :-) # Wow, even better, dropped that testing bit. #my %hostlist = map { /^(\w+)\s(.*)/?($1,$2):() } (<DATA>); my %hostlist = map { /^(\w+)\s(.*)/ } (<DATA>); print Dumper(\%hostlist); __DATA__ host1 1.1.1.1 host2 1.2.3.5
    Method 2 - Better approach
    use strict; use Data::Dumper; my %hostlist; { local $/; %hostlist = <DATA> =~ /^(\w+)\s(.*)/gm; } print Dumper(\%hostlist); __DATA__ host1 1.1.1.1 host2 1.2.3.5
    Updated: added the 3rd method after seen jonadab's suggestion. Here's my solution of capturing a real host file.

    Method 3 - Capture from the /etc/host file
    use strict; use Data::Dumper; my %hosts; while (<DATA>) { next if /^\s*(?:#|$)/; # ignore comments and empty lines /^([^\s#]+)\s+(.*)/; # capture ip address and names $hosts{$_} = $1 foreach split /\s+/, $2; } print Dumper(\%hosts); __DATA__ # IP Masq gateway: 192.168.0.80 pedestrian # Primary desktop: 192.168.0.82 raptor1 # Family PC upstairs: 192.168.0.84 trex tyrannosaur family # Domain servers: 205.212.123.10 dns1 brutus 208.140.2.15 dns2 156.63.130.100 dns3 cherokee

      Look at your first regex again. /^(\w+)+/ looks quite funny :) aww he decided to remove the double '+'ing :(

      Anyhow... just another way of writing the regex:

      my %hotlist = map { m#\A(\S+)\s+(.*)\z#; $1 => $2 } <DATA>;

      By no means am I discounting the other methods, but I really like this one-liner due to my haphazzard knowledge of regex:

      %hostlist = <DATA> =~ /^(\w+)+\s(.*)/gm;

      This is exactly what I was looking for even though I was no where near explicit about what I wanted.

      I'm already applying this method to a few more scripts. The map was cool too.

      I am somewhat curious as to the extensive use of Data::Dumper instead of print-ing %hash?

      Thanks all!
        I am somewhat curious as to the extensive use of Data::Dumper instead of print-ing %hash?
        Just shows the structure better, and shows up any undefs, tabs, newlines, wide characters, unprintable characters, etc. that sneak into your data better.
      my %hostlist = map { /^(\w+)\s(.*)/; $1 => $2 } (<DATA>);
      should be....
      my %hostlist = map { /^(\w+)\s+(.*)/; $1 => $2 } (<DATA>);
      (extra + after the whitespace metachar)

      davis
      It's not easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.
Re: newbie hasher
by davido (Cardinal) on Nov 20, 2003 at 00:35 UTC
    Here's another way:

    use strict; use warnings; my %hash = map { chomp; split /\s+/, $_, 2 } <DATA>; print "$_, $hash{$_}\n" foreach keys %hash; __DATA__ host1 1.1.1.1 host2 1.2.3.5

    Fun huh? ;)


    Dave


    "If I had my life to live over again, I'd be a plumber." -- Albert Einstein
Re: newbie hasher
by Coruscate (Sexton) on Nov 20, 2003 at 00:37 UTC

    Since nobody else has used map() yet:

    open my $hosts, '<', 'hosts.txt' or die "open failed: $!"; my %hosts = map { chomp; split /\s+/ } <$hosts>; close $hosts or die "close failed: $!";

    Update: Dope! Two people got maps in before me (somehow) lol

Re: newbie hasher (No loops!)
by BrowserUk (Patriarch) on Nov 20, 2003 at 00:49 UTC

    Look Ma. No loops :)

    #! perl -slw use strict; use Data::Dumper; my %h = split ' ', do{ local $/; <DATA> }; print Dumper \%h; __DATA__ host1 1.1.1.1 host2 2.2.2.2 host3 3.3.3.3 host4 4.4.4.4

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

Re: newbie hasher
by DrHyde (Prior) on Nov 20, 2003 at 04:06 UTC
    While I don't doubt that your file looks like that, it's not a regular hosts file. In /etc/hosts, each line has an IP followed by a name (and optionally more names), whereas you have a name followed by an IP. /etc/hosts can also have comments in it. To deal with a traditional /etc/hosts file ...
    open(HOSTS, '/etc/hosts') || die("Out of cucumber error\n"); my %hosts; while(<HOSTS>) { s/#.*//; next unless /(\S+)\s+(.*)/; my $host = $1; push @{$hosts{$host}}, split(/\s+/, $2); } use Data::Dumper; print Dumper(\%hosts);
Re: newbie hasher
by jonadab (Parson) on Nov 20, 2003 at 07:12 UTC
    my %hosts = map{ split /\s+/, $_, 2 }<DATA>; # Or substitute your favourite filehandle here. __DATA__ host1 1.1.1.1 host2 1.2.3.5

    Note, however, that the format you give is not the standard format for hosts files. The usual format is more like...

    # IP Masq gateway: 192.168.0.80 pedestrian # Primary desktop: 192.168.0.82 raptor1 # Family PC upstairs: 192.168.0.84 trex tyrannosaur family # Domain servers: 205.212.123.10 dns1 brutus 208.140.2.15 dns2 156.63.130.100 dns3 cherokee

    This is easy enough to read too...

    open HOSTS, "</etc/hosts"; # Or "<C:\\WINDOWS\\hosts"; my %hosts = map{ my $ip, $hn, @hn; if (not /^\s*#/) { chomp; s/\s*#.*$//; ($ip, @hn) = split /\s+/, $_; } map { $_ => $ip }, @hn; }<HOSTS>;

    $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
Re: newbie hasher
by Art_XIV (Hermit) on Nov 20, 2003 at 08:58 UTC

    If you expect your input to be 'dirty', then regular expressions might serve you better:

    use strict; use Data::Dumper; my %hosts; while (<DATA>) { chomp; next if /^#/; $hosts{$1} = $2 if /^(\w+)\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/; warn "Garbage at line $. of DATA: $_\n" unless $1 and $2; } print Dumper(%hosts); 1; __DATA__ host1 1.13.10.116 #comment host2 21.90.30.31 #blah blah blah host3 56.87.1.10 host4 13.16.17 #The following hosts are on network B host5 57.98.18.10 #yadda yadda yadda host6 106.10.3.12

    Splits would be more efficient if you expect the input to be clean, though.

    Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://308499]
Approved by Roger
Front-paged by Roger
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.