Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

How to add values of hash by reading from different text files

by faozhi (Acolyte)
on Apr 27, 2009 at 05:50 UTC ( [id://760271]=perlquestion: print w/replies, xml ) Need Help??

faozhi has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, Need a small favour from you gurus out there. I have 3 text files below.
This is one.txt
chromosome1 50000 12 20
chromosome2 20000 0 21
chromosome3 41444 9 2
chromosome4 21414 4 1

This is two.txt
chromosome1 50000 41 51
chromosome2 20000 1 20
chromosome3 41444 2 11
chromosome6 12141 12 22

This is three.txt
chromosome1 50000 11 2
chromosome2 20000 3 22
chromosome3 41444 2 15

The first column is the chromosome number. The 2nd column is the position. The 3rd column is the value1 and 4th column is value2.
What i am trying to do is to read the first file, one.txt, add the $lines to hash and then open the 2nd file, two.txt, see if exists. If exists, add the value1 of the 2nd file to the 1st file and value2 of the 2nd file to first file.
I need to do a loop, till the last line of each file and then print all the $lines and the final values for value1 and value2. Value 1 and value 2 should not be sum up together.
This is the code I got so far but it doesn't give the right output as what i want.
#!/usr/bin/perl -w #declare all filenames $filename1 = 'one.txt'; $filename2 = 'two.txt'; $filename3 = 'three.txt'; #open text file 1 open (FILE1, $filename1) or die "Unable to open $filename1 because $!\ +n"; while ($line = <FILE1>) { chomp ($line); ($chrX, $chrpos, $value1, $value2) = split (/\t/, $line); $key1 = join ("_", $chrX, $chrpos); $hash{$key1}++; }; close FILE1; open (FILE2, $filename2) or die "Unable to open $filename2 because $!\ +n"; while ($line = <FILE2>) { chomp ($line); ($chrX, $chrpos, $value11, $value22) = split (/\t/, $line); $key2 = join ("_", $chrX, $chrpos); if (exists $hash{$key2} > 0) { $hash{key2} = $value11 + $value1; $hash{key2} = $value22 + $value2; $hash{key2}++ } }; close FILE2; open (FILE3, $filename3) or die "Unable to open $filename3 because $!\ +n"; while ($line = <FILE3>) { chomp ($line); ($chrX, $chrpos, $value111, $value222) = split (/\t/, $line); $key3 = join ("_", $chrX, $chrpos); if (exists $hash{$key3} > 0) { $hash{key3} = $value111 + $value11; $hash{key3} = $value222 + $value22; $hash{key3}++ } }; print "$hash{key3} $value111 $value222"; close FILE2;
Sorry, its my first time posting, i hope I did it right. And thanks in advance for the help. :)

The final output i am looking for is something like this
chromosome1_50000 64 73 chromosome2_20000 4 63 chromosome3_41444 13 35 chromosome4_21414 4 1 chromosome6_12141 12 22

Replies are listed 'Best First'.
Re: How to add values of hash by reading from different text files
by CountZero (Bishop) on Apr 27, 2009 at 06:10 UTC
    By using a foreach-loop on the list of file names and using lexical variables, it should be no bigger than this:
    #!/usr/bin/perl -w use strict; use warnings; use Data::Dump qw/dump/; my %data; foreach my $filename (qw/one.txt two.txt three.txt/) { open( my $file, $filename ) or die "Unable to open $filename because $!\n"; while (<$file>) { chomp; my ( $chrX, $chrpos, $value1, $value2 ) = split(/\s+/); $data{$chrX}->{$chrpos}->{'value1'} += $value1; $data{$chrX}->{$chrpos}->{'value2'} += $value2; } ## end while (<$file>) } ## end foreach my $filename (qw/one.txt two.txt three.txt/) print dump( \%data );
    Output:
    { chromosome1 => { 50000 => { value1 => 64, value2 => 73 } }, chromosome2 => { 20000 => { value1 => 4, value2 => 63 } }, chromosome3 => { 41444 => { value1 => 13, value2 => 28 } }, chromosome4 => { 21414 => { value1 => 4, value2 => 1 } }, chromosome6 => { 12141 => { value1 => 12, value2 => 22 } }, }

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      hi,

      can i know why use Data::Dump qw/dump/ was used at the top?

      Cheers
      Hi CountZero,

      Your programming skills are lovely. But i haven't reach the "my" operator that you are using, and I do not want my groupmate to think i got this from somewhere else.

      If it was from my original code, what changes should i make? And just in case some people might want to use array, the supposed text files, one.txt, two.txt, three.txt have a large number of line, estimated, 10000 lines. Any tips?
      Appreciate it a lot.

      I actually just started perl last 2 weeks ago.

      Cheers.
        The previous post was me. I forgot to sign in when i posted that. Sorry.
Re: How to add values of hash by reading from different text files
by citromatik (Curate) on Apr 27, 2009 at 07:09 UTC

    Unless you have a good reason for not doing so, always use strict in your code

    There are several errors in your code:

    • While processing file2 and file3, you are using the literals key2 and key3 as hash keys, instead of the variables $key2 and $key3
    • While processing file3 and file3 you are incrementing the values of the hashes $hash{key3}++, don't know why
    • Also, when processing the files you are assigning different values to the same hash key:
      $hash{key2} = $value11 + $value1; $hash{key2} = $value22 + $value2;
      the second statement overrides the first. You should be using different (sub)hashes for each value:$hash{$chrX}{value1} += $value11 ...

    citromatik

Re: How to add values of hash by reading from different text files
by ELISHEVA (Prior) on Apr 27, 2009 at 07:22 UTC

    First, your question is well written. That is one reason you got such a quick response from CountZero.

    I am frankly surprised that you have not been taught about my yet. Best practice in Perl programming expects you to use strictures and my/our. Given that this advice is in all of the well known recent Perl training books, your professor is likely to assume that you simply did your homework. If you are concerned, look up a citation and include in in an explanatory note in your code. You can find an appropriate citation in the Camel book or any of the books listed here. "Here" is the Perl page of Larry Wall, the inventor of Perl. You can also probably find a citation even your own textbook. Surely you are allowed to read ahead in your own textbooks?

    my declares variables. Strictures are the two lines at the top of CountZero's script: use strict; use warnings; Among other things strictures require you to declare variables (with either my or our) and warn you when you are using variables in ways that you probably shouldn't. Unless you have a very specific (and expert) reason, you should always use these two lines at the top of every script.

    Now for how to fix your own code. Your code isn't working because you and your group mate need to use and understand the concepts of Autovivification and Hashes of Hashes. Specifically, relating to your script:

    • $hash{$key1}++ doesn't add an element to a hash. It adds one to whatever hash value is assigned to the key $key1.
    • If you need to assign two separate values to a single hash key, use a Hash of Hashes. To assign a value to an element in a hash of hashes use $hash{$key1}{value1}=$value1;  $hash{$key1}{value2}=$value2;. If you need to add a value to the current value use: $hash{$key1}{value1}+=$value1;  $hash{$key1}{value2}+=$value2;
    • If you make an assignment (via =, +=, -=, *=, etc) to a hash key, it automatically creates the key. This is called autovivification. Thus there is no need to explicitly create hash keys.

    Best, beth

      Hi Beth,

      Firstly, thank you so much for your really helpful reply.
      I am using O'REILLY Learning Perl as my guide and reference book. However, hash of hashses and autovivification isn't in the book, which was why I got stuck
      And honestly, i am not an IT student and I am self learning perl. I need to use this for some of my research work related to genetics.

      Cheers
        A very good (and free) book to (self) learn Perk is "Beginning Perl" which can be found here. "use strict; and the use of my are explained in Chapter two, page 66.

        I use this book in the Perl programming course I teach in our local computer club.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: How to add values of hash by reading from different text files
by roboticus (Chancellor) on Apr 27, 2009 at 12:26 UTC
    faozhi:

    Others are assisting you with the question you asked. I'm going to fly off on a couple of tangents, instead, about basic coding practices.

    Commenting

    Generally, if you write your code clearly, the need for comments is greatly reduced. For example, in this section of your code, the variable names are clearly file names, so the comment is redundant.

    #declare all filenames $filename1 = 'one.txt'; $filename2 = 'two.txt'; $filename3 = 'three.txt';

    In this section, the comment is pretty much a duplication of what the code says, so it's not helpful.

    #open text file 1 open (FILE1, $filename1) or die "Unable to open $filename1 because $!\ +n";

    If I felt a comment necessary, I would instead have stated in my comment the effect of what I was doing, like this:

    # Store the contents of the first file into $hash{col1_col2} open (FILE1, $filename1) or die "Unable to open $filename1 because $!\ +n"; while ($line = <FILE1>) { chomp ($line); ($chrX, $chrpos, $value1, $value2) = split (/\t/, $line); $key1 = join ("_", $chrX, $chrpos); $hash{$key1}++; }; close FILE1;

    Indentation

    The use of indentation is supposed to clarify the structure of the code, so you can see which statements are bundled together, and to make it simple to tell which code is associated with which control-flow structure. By having all your1 control-flow statements aligned to the left margin, you make it more difficult to see the logical structure of the program. You should change from this:

    while ($line = <FILE2>) { chomp ($line); ($chrX, $chrpos, $value11, $value22) = split (/\t/, $line); $key2 = join ("_", $chrX, $chrpos); if (exists $hash{$key2} > 0) { $hash{key2} = $value11 + $value1; $hash{key2} = $value22 + $value2; $hash{key2}++ } };

    to this:

    while ($line = <FILE2>) { chomp ($line); ($chrX, $chrpos, $value11, $value22) = split (/\t/, $line); $key2 = join ("_", $chrX, $chrpos); if (exists $hash{$key2} > 0) { $hash{key2} = $value11 + $value1; $hash{key2} = $value22 + $value2; $hash{key2}++ } };

    Obviously, it's not a problem in this particular program, as you don't have anything complicated going on. But when you have a page full of code with a lot of flow-control going on, you're going to find it difficult to maintain your code.

    Semicolons

    While not harmful, you're putting extra semicolons in your code (specifically at the end of your while loops. It doesn't hurt anything in this case, but since they're unexpected, it *does* make the code slightly harder to read.

    Subroutines

    When you start writing the same code repeatedly, you should start thinking about how you can use subroutines to simplify your task. For example, this code:

    open (FILE2, $filename2) or die "Unable to open $filename2 because $!\ +n"; while ($line = <FILE2>) { chomp ($line); ($chrX, $chrpos, $value11, $value22) = split (/\t/, $line); $key2 = join ("_", $chrX, $chrpos); if (exists $hash{$key2} > 0) { $hash{key2} = $value11 + $value1; $hash{key2} = $value22 + $value2; $hash{key2}++ } }; close FILE2;

    is nearly identical to the code you use to process file 3. So you should think about using a subroutine to process the files. For example, you could create a subroutine like this:

    sub process_file { my $filename = shift or die "Missing filename!"; open (FILE, $filename) or die "Unable to open $filename because $! +\n"; while ($line = <FILE2>) { chomp ($line); ($chrX, $chrpos, $value11, $value22) = split (/\t/, $line); $key2 = join ("_", $chrX, $chrpos); if (exists $hash{$key2} > 0) { $hash{key2} = $value11 + $value1; $hash{key2} = $value22 + $value2; $hash{key2}++ } } close FILE; }

    then, in your code, you can process your second and third files like2:

    process_file($filename2); process_file($filename3);

    Due to correctness issues in your code, I can't tell whether it's possible or not, but frequently in programs like this, you can use the same subroutine for your first file as well--the if statements will degenerate to a single case, and only be used for the successive files. Once you clean up the other bits of your code, you might be able to take advantage of it.

    use strict; use warnings;

    I haven't checked to see whether the strict or warnings modules would help in this case or not, but it would be to your advantage to put them into your program before anything else. They will catch many programming errors for you. You may even find "use diagnostics" helpful. (I generally only put in "use diagnostics" when I don't understand what the error message is trying to tell me.)

    I hope you find some of this useful.

    ...roboticus

    Updates: (marked by superscripts in the above text)

    1. Changed 'you' to 'your'
    2. In the next code snippet, I corrected the second line, changing 'process-file' to 'process_file'
Re: How to add values of hash by reading from different text files
by bichonfrise74 (Vicar) on Apr 27, 2009 at 19:46 UTC
    Another possible solution... In this case, I just combined all your data into a single input for easier manipulation.
    #!/usr/bin/perl use strict; use Data::Dumper; my %chromosome; while( <DATA> ) { my ($name, $pos, $val1, $val2) = split; if ( defined( $chromosome{$name}{$pos} )) { $chromosome{$name}{$pos}[0] = $chromosome{$name}{$pos}[0] + $v +al1; $chromosome{$name}{$pos}[1] = $chromosome{$name}{$pos}[1] + $v +al2; } else { $chromosome{$name}{$pos} = [$val1, $val2]; } } print Dumper(\%chromosome); __DATA__ chromosome1 50000 12 20 chromosome2 20000 0 21 chromosome3 41444 9 2 chromosome4 21414 4 1 chromosome1 50000 41 51 chromosome2 20000 1 20 chromosome3 41444 2 11 chromosome6 12141 12 22 chromosome1 50000 11 2 chromosome2 20000 3 22 chromosome3 41444 2 15

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://760271]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-20 02:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found