How to add values of hash by reading from different text files

faozhi has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to add values of hash by reading from different text files by CountZero (Bishop) on Apr 27, 2009 at 06:10 UTC
By using a `foreach`-loop on the list of file names and using lexical variables, it should be no bigger than this: `#!/usr/bin/perl -w use strict; use warnings; use Data::Dump qw/dump/; my %data; foreach my $filename (qw/one.txt two.txt three.txt/) { open( my $file, $filename ) or die "Unable to open $filename because $!\n"; while (<$file>) { chomp; my ( $chrX, $chrpos, $value1, $value2 ) = split(/\s+/); $data{$chrX}->{$chrpos}->{'value1'} += $value1; $data{$chrX}->{$chrpos}->{'value2'} += $value2; } ## end while (<$file>) } ## end foreach my $filename (qw/one.txt two.txt three.txt/) print dump( \%data );` [download] Output: `{ chromosome1 => { 50000 => { value1 => 64, value2 => 73 } }, chromosome2 => { 20000 => { value1 => 4, value2 => 63 } }, chromosome3 => { 41444 => { value1 => 13, value2 => 28 } }, chromosome4 => { 21414 => { value1 => 4, value2 => 1 } }, chromosome6 => { 12141 => { value1 => 12, value2 => 22 } }, }` [download] CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re^2: How to add values of hash by reading from different text files by faozhi (Acolyte) on Apr 27, 2009 at 10:24 UTC
hi, can i know why `use Data::Dump qw/dump/` was used at the top? Cheers	[reply] [d/l]
Re^2: How to add values of hash by reading from different text files by Anonymous Monk on Apr 27, 2009 at 06:46 UTC
Hi CountZero, Your programming skills are lovely. But i haven't reach the "my" operator that you are using, and I do not want my groupmate to think i got this from somewhere else. If it was from my original code, what changes should i make? And just in case some people might want to use array, the supposed text files, one.txt, two.txt, three.txt have a large number of line, estimated, 10000 lines. Any tips? Appreciate it a lot. I actually just started perl last 2 weeks ago. Cheers.	[reply]
Re^3: How to add values of hash by reading from different text files by faozhi (Acolyte) on Apr 27, 2009 at 06:47 UTC
The previous post was me. I forgot to sign in when i posted that. Sorry.	[reply]
Re^4: How to add values of hash by reading from different text files by CountZero (Bishop) on Apr 27, 2009 at 09:56 UTC
Re: How to add values of hash by reading from different text files by citromatik (Curate) on Apr 27, 2009 at 07:09 UTC
Unless you have a good reason for not doing so, always `use strict` in your code There are several errors in your code: While processing file2 and file3, you are using the literals `key2` and `key3` as hash keys, instead of the variables `$key2` and `$key3` While processing file3 and file3 you are incrementing the values of the hashes `$hash{key3}++`, don't know why Also, when processing the files you are assigning different values to the same hash key: `$hash{key2} = $value11 + $value1; $hash{key2} = $value22 + $value2;` [download] the second statement overrides the first. You should be using different (sub)hashes for each value:`$hash{$chrX}{value1} += $value11 ...` citromatik	[reply] [d/l] [select]
Re: How to add values of hash by reading from different text files by ELISHEVA (Prior) on Apr 27, 2009 at 07:22 UTC
First, your question is well written. That is one reason you got such a quick response from CountZero. I am frankly surprised that you have not been taught about my yet. Best practice in Perl programming expects you to use strictures and my/our. Given that this advice is in all of the well known recent Perl training books, your professor is likely to assume that you simply did your homework. If you are concerned, look up a citation and include in in an explanatory note in your code. You can find an appropriate citation in the Camel book or any of the books listed here. "Here" is the Perl page of Larry Wall, the inventor of Perl. You can also probably find a citation even your own textbook. Surely you are allowed to read ahead in your own textbooks? my declares variables. Strictures are the two lines at the top of CountZero's script: `use strict; use warnings;` Among other things strictures require you to declare variables (with either my or our) and warn you when you are using variables in ways that you probably shouldn't. Unless you have a very specific (and expert) reason, you should always use these two lines at the top of every script. Now for how to fix your own code. Your code isn't working because you and your group mate need to use and understand the concepts of Autovivification and Hashes of Hashes. Specifically, relating to your script: `$hash{$key1}++` doesn't add an element to a hash. It adds one to whatever hash value is assigned to the key `$key1`. If you need to assign two separate values to a single hash key, use a Hash of Hashes. To assign a value to an element in a hash of hashes use `$hash{$key1}{value1}=$value1; $hash{$key1}{value2}=$value2;`. If you need to add a value to the current value use: `$hash{$key1}{value1}+=$value1; $hash{$key1}{value2}+=$value2;` If you make an assignment (via =, +=, -=, *=, etc) to a hash key, it automatically creates the key. This is called autovivification. Thus there is no need to explicitly create hash keys. Best, beth	[reply] [d/l] [select]
Re^2: How to add values of hash by reading from different text files by faozhi (Acolyte) on Apr 27, 2009 at 07:34 UTC
Hi Beth, Firstly, thank you so much for your really helpful reply. I am using O'REILLY Learning Perl as my guide and reference book. However, hash of hashses and autovivification isn't in the book, which was why I got stuck And honestly, i am not an IT student and I am self learning perl. I need to use this for some of my research work related to genetics. Cheers	[reply]
Re^3: How to add values of hash by reading from different text files by CountZero (Bishop) on Apr 27, 2009 at 10:10 UTC
A very good (and free) book to (self) learn Perk is "Beginning Perl" which can be found here. "`use strict;` and the use of `my` are explained in Chapter two, page 66. I use this book in the Perl programming course I teach in our local computer club. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re: How to add values of hash by reading from different text files by roboticus (Chancellor) on Apr 27, 2009 at 12:26 UTC
faozhi: Others are assisting you with the question you asked. I'm going to fly off on a couple of tangents, instead, about basic coding practices. Commenting Generally, if you write your code clearly, the need for comments is greatly reduced. For example, in this section of your code, the variable names are clearly file names, so the comment is redundant. `#declare all filenames $filename1 = 'one.txt'; $filename2 = 'two.txt'; $filename3 = 'three.txt';` [download] In this section, the comment is pretty much a duplication of what the code says, so it's not helpful. `#open text file 1 open (FILE1, $filename1) or die "Unable to open $filename1 because $!\ +n";` [download] If I felt a comment necessary, I would instead have stated in my comment the effect of what I was doing, like this: `# Store the contents of the first file into $hash{col1_col2} open (FILE1, $filename1) or die "Unable to open $filename1 because $!\ +n"; while ($line = <FILE1>) { chomp ($line); ($chrX, $chrpos, $value1, $value2) = split (/\t/, $line); $key1 = join ("_", $chrX, $chrpos); $hash{$key1}++; }; close FILE1;` [download] Indentation The use of indentation is supposed to clarify the structure of the code, so you can see which statements are bundled together, and to make it simple to tell which code is associated with which control-flow structure. By having all your¹ control-flow statements aligned to the left margin, you make it more difficult to see the logical structure of the program. You should change from this: `while ($line = <FILE2>) { chomp ($line); ($chrX, $chrpos, $value11, $value22) = split (/\t/, $line); $key2 = join ("_", $chrX, $chrpos); if (exists $hash{$key2} > 0) { $hash{key2} = $value11 + $value1; $hash{key2} = $value22 + $value2; $hash{key2}++ } };` [download] to this: `while ($line = <FILE2>) { chomp ($line); ($chrX, $chrpos, $value11, $value22) = split (/\t/, $line); $key2 = join ("_", $chrX, $chrpos); if (exists $hash{$key2} > 0) { $hash{key2} = $value11 + $value1; $hash{key2} = $value22 + $value2; $hash{key2}++ } };` [download] Obviously, it's not a problem in this particular program, as you don't have anything complicated going on. But when you have a page full of code with a lot of flow-control going on, you're going to find it difficult to maintain your code. Semicolons While not harmful, you're putting extra semicolons in your code (specifically at the end of your `while` loops. It doesn't hurt anything in this case, but since they're unexpected, it does make the code slightly harder to read. Subroutines When you start writing the same code repeatedly, you should start thinking about how you can use subroutines to simplify your task. For example, this code: `open (FILE2, $filename2) or die "Unable to open $filename2 because $!\ +n"; while ($line = <FILE2>) { chomp ($line); ($chrX, $chrpos, $value11, $value22) = split (/\t/, $line); $key2 = join ("_", $chrX, $chrpos); if (exists $hash{$key2} > 0) { $hash{key2} = $value11 + $value1; $hash{key2} = $value22 + $value2; $hash{key2}++ } }; close FILE2;` [download] is nearly identical to the code you use to process file 3. So you should think about using a subroutine to process the files. For example, you could create a subroutine like this: `sub process_file { my $filename = shift or die "Missing filename!"; open (FILE, $filename) or die "Unable to open $filename because $! +\n"; while ($line = <FILE2>) { chomp ($line); ($chrX, $chrpos, $value11, $value22) = split (/\t/, $line); $key2 = join ("_", $chrX, $chrpos); if (exists $hash{$key2} > 0) { $hash{key2} = $value11 + $value1; $hash{key2} = $value22 + $value2; $hash{key2}++ } } close FILE; }` [download] then, in your code, you can process your second and third files like²: `process_file($filename2); process_file($filename3);` [download] Due to correctness issues in your code, I can't tell whether it's possible or not, but frequently in programs like this, you can use the same subroutine for your first file as well--the if statements will degenerate to a single case, and only be used for the successive files. Once you clean up the other bits of your code, you might be able to take advantage of it. use strict; use warnings; I haven't checked to see whether the strict or warnings modules would help in this case or not, but it would be to your advantage to put them into your program before anything else. They will catch many programming errors for you. You may even find "use diagnostics" helpful. (I generally only put in "use diagnostics" when I don't understand what the error message is trying to tell me.) I hope you find some of this useful. ...roboticus Updates: (marked by superscripts in the above text) Changed 'you' to 'your' In the next code snippet, I corrected the second line, changing 'process-file' to 'process_file'	[reply] [d/l] [select]
Re: How to add values of hash by reading from different text files by bichonfrise74 (Vicar) on Apr 27, 2009 at 19:46 UTC
Another possible solution... In this case, I just combined all your data into a single input for easier manipulation. #!/usr/bin/perl use strict; use Data::Dumper; my %chromosome; while( <DATA> ) { my ($name, $pos, $val1, $val2) = split; if ( defined( $chromosome{$name}{$pos} )) { $chromosome{$name}{$pos}[0] = $chromosome{$name}{$pos}[0] + $v +al1; $chromosome{$name}{$pos}[1] = $chromosome{$name}{$pos}[1] + $v +al2; } else { $chromosome{$name}{$pos} = [$val1, $val2]; } } print Dumper(\%chromosome); __DATA__ chromosome1 50000 12 20 chromosome2 20000 0 21 chromosome3 41444 9 2 chromosome4 21414 4 1 chromosome1 50000 41 51 chromosome2 20000 1 20 chromosome3 41444 2 11 chromosome6 12141 12 22 chromosome1 50000 11 2 chromosome2 20000 3 22 chromosome3 41444 2 15 [download]	[reply] [d/l]


Your skill will accomplish what the force of many cannot
	PerlMonks

How to add values of hash by reading from different text files

Commenting

Indentation

Semicolons

Subroutines

use strict; use warnings;