http://www.perlmonks.org?node_id=732430

ashnator has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Hi I have written a program using hash of hashes type of data structure. I am using the keys in File 1 to parse the file 2.
I am unable to get the correct output. I have got many multiple values for the single key in my File 1.
I am getting this error / warning in my program
Use of uninitialized value in hash element at TEST.pl line 17, <$fh> l +ine 4. Use of uninitialized value in hash element at TEST.pl line 18, <$fh> l +ine 4. Use of uninitialized value in hash element at TEST.pl line 17, <$fh> l +ine 5. Use of uninitialized value in hash element at TEST.pl line 20, <$fh> l +ine 5. KEY in File1 not found in File2 KEY in File1 not found in File2 KEY 155369268: File2 string length too short for File1 position value KEY 155369268: File2 string length too short for File1 position value KEY 269212605: File2 string length too short for File1 position value
Here is my code
#!/usr/bin/perl use strict; use warnings; my $qfn1 = "File1.txt"; my $qfn2 = "File2.txt"; my %positions; { open(my $fh, '<', $qfn1) or die("Cannot open file \"$qfn1\": $!\n"); while (<$fh>) { chomp; my ($key, $pos) = split /\s+/; if (!$positions{$key}) { $positions{$key} = [$pos]; } else { my $ref = $positions{$key}; push @$ref,$pos; } } } my %sequences; { open(my $fh, '<', $qfn2) or die("Cannot open file \"$qfn2\": $!\n"); my $key; while (<$fh>) { if ( s/^>// ) { $key = ( split /\|/ )[1]; } else { chomp; $sequences{$key} .= $_; } } } for my $key ( sort keys %positions ) { my $ref = $positions{$key}; foreach my $value (@$ref) { if ( ! exists( $sequences{$key} )) { warn "KEY $key in File1 not found in File2\n"; next; } if ( length( $sequences{$key} ) < $positions{$key} ) { warn "KEY $key: File2 string length too short for File1 positi +on value\n"; next; } my $index = rindex( $sequences{$key}, "ATG", $positions{$key} ); if ( $index < 0 ) { warn sprintf( "KEY %s: No ATG in File2 string prior to positio +n %d\n", $key, $positions{$key} ); next; } $index += 3 while ( ($index + 3) < $positions{$key} ); print "$key $positions{$key} " . substr($sequences{$key}, $index, +3) . "\n"; } }
My Input file looks like this:-
File 1:- 155369268 17 A 155369268 70 G 269212605 75 T File 2:- >gi|155369268|ref|NM_001100917.1| some text AAACAATGTCGATTCTATGATGCGAACGCAGCATTTCAGGGACTGGAATGGGAGCTTACGGTTTTTTACG ACAGAATCATCAATATCTTGGAAGAAAAAGAATGTTAAGAAATAACAAAACAATAATTATTAAGTACTTT >gi|269212605|ref|XM_401716884.1| some text AGACAAGCTTGTCCTGATGTTCCTTGCCCTGGCAGATGTTCAGGACCTTCCTTTGATTCAACCCTATGAC CTAATTGGCCCAAGCTTTCGGGGCTGTCATTGTCTGTTTGTCATTCAAGGGCCCAAGCTGAAGAGGGGGT
I want my Output should be like this:-
155369268 17 CTA 155369268 70 CGA 269212605 120 TTG
good bye

20081224 Janitored by Corion: Restored content

Replies are listed 'Best First'.
Re: Hashs of hash (multiple value) key problem.
by graff (Chancellor) on Dec 24, 2008 at 11:32 UTC
    Dude, you are really coming across as being profoundly clueless. This thread is seventh one you've started on this same basic programming task. (For those keeping score, the previous six, in chronological order, are: Regex problem, Extracting Locations in 3 window size, Debug help, Character replacement, Missing to catch 70th character in 3 window size, and Multiple Key Problem help.)

    A common attribute of all these threads is that you have never given a clear description of what you are really trying to accomplish. What exactly are the specs for this job? When you say:

    I want my Output should be like this:-
    155369268 17 CTA 155369268 70 CGA 269212605 120 TTG

    HOW DO YOU KNOW that this is what the output should be? What are the principles for determining what the program is supposed to locate in these files? If you can form a clear, unambiguous answer to that question (in the form of a "cookbook recipe"), it will be a lot easier to get the program to do what is needed.

    I'm actually somewhat skeptical about the particular values you are citing here as the "desired" output. Where is the "120" supposed to come from, given that File1.txt does not contain this number?

    As for the particular version of noise you've added in this OP to code that others have suggested to you, I don't understand why you have added a "$ref" variable in this while loop:

    while (<$fh>) { chomp; my ($key, $pos) = split /\s+/; if (!$positions{$key}) { $positions{$key} = [$pos]; } else { my $ref = $positions{$key}; push @$ref,$pos; } }
    (I have undone your randomization of the white-space and indentation.) So, you apparently do not realize that the "else" block there does nothing at all except to push an element onto an array ref, which then goes out of scope (disappears completely, ceases to exist) when you step out of that "else" block. I also don't understand why you are using array refs at all in this while loop.

    I think it's time for you to confess that you really don't understand what you are trying to do, let alone what you are actually doing. If it's important for this job to get done, someone else is going to have to do it. Go to your supervisor or advisor or other knowledgeable person who can speak to you face-to-face, and admit to them that you are lost.

    It looks like the kind of help you need is not the kind of help you can get from perlmonks.

      The $ref was added at my suggestion as the keys in file 1 are not unique, so an array is needed to store the varying $pos values that $key can have. The suggested code produced exactly the result requested in the "Multiple Key Problem Help" thread.

      I agree with the rest of your response. Way too many threads for some very basic problems. A bit of googling would answer the problem raised this time.
        Hi modified the program but it is still giving some errors :(
        #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $qfn1 = "File1.txt"; my $qfn2 = "File2.txt"; my %positions; { open(my $fh, '<', $qfn1) or die("Cannot open file \"$qfn1\": $!\n"); while ( <$fh> ) { chomp; my ( $key, $pos ) = split /\s+/; push @{ $positions{$key} }, $pos; } } print Data::Dumper->Dumpxs( [ \ %positions ], [ qw{ *positions } ] ); my %sequences; { open(my $fh, '<', $qfn2) or die("Cannot open file \"$qfn2\": $!\n"); my $key; while (<$fh>) { if ( s/^>// ) { $key = ( split /\|/ )[1]; } else { chomp; $sequences{$key} .= $_; } } } for my $key (sort keys %positions) { for my $key (keys %{$positions{$key}}) { foreach my $value (@) { if (! exists( $sequences{$key} )) { warn "KEY $key in File1 not found in File2\n"; next; } if ( length( $sequences{$key} ) < $positions{$key} ) { warn "KEY $key: File2 string length too short for File1 positi +on value\n"; next; } my $index = rindex( $sequences{$key}, "ATG", $positions{$key} ); if ( $index < 0 ) { warn sprintf( "KEY %s: No ATG in File2 string prior to positio +n %d\n", $key, $positions{$key} ); next; } $index += 3 while ( ($index + 3) < $positions{$key} ); print "$key $positions{$key} " . substr($sequences{$key}, $index, +3) . "\n"; } } }
Re: Hashs of hash (multiple value) key problem.
by planetscape (Chancellor) on Dec 24, 2008 at 09:47 UTC
Re: Hashs of hash (multiple value) key problem.
by Seqi (Acolyte) on Dec 24, 2008 at 09:34 UTC
Re: Hashs of hash (multiple value) key problem.
by tilly (Archbishop) on Dec 24, 2008 at 18:57 UTC
    Go read references quick reference and learn how to do complex data structures. Then write toy examples and print them out with Data::Dumper until you are sure you have absorbed that. Then take your problem and break it into small pieces. Code each piece. Test that it does what you expect and has the values you expect. Put it together and test it again.

    That is a likely path to success.

    Continuing to try to solve the whole problem at once while you are unsure of basic concepts is not going to work. Trust me. I once had the fun of working with a novice in training. After she had been flailing for a month while getting nowhere I sat down with her and made her break the problem up. Every day I asked her what her plan was for the day and made sure she was working in small pieces. The project was done a week later. I helped her start the same thing with the next. After that she did well.

    Divide and conquer. It isn't just useful for Julius Caesar.

      Ya thanks I will do exactly what you said. I am a bit sad but bcoz i need to give solution to the above problem :(
Re: Hashs of hash (multiple value) key problem.
by linuxer (Curate) on Dec 24, 2008 at 09:55 UTC

    Just some of my thoughts on this, maybe no solution to your question, but these just came up to my mind:

    • I would reformat the script; especially the indentation
    • I think the line numbers of the warnings don't fit to the posted code and its line numbers
    • make the code for reading file1 shorter:
      while ( <$fh> ) { chomp; my ( $key, $pos ) = split /\s+/; # fill HoA automagically push @{ $positions{$key} }, $pos; }
    • Make clear, that $value contains the position; rename $value into $position
    • In line 99 of your code, you compare the length of $sequence{$key} with the array reference in $positions{$key}; shouldn't you compare with $value (resp. the suggested $position)?
      >>>>make the code for reading file1 shorter
      But only do that if you only want to keep the last value you find matching a duplicate key !
        Ooops sorry, missed the @ at the start of the statement. That'll work as suggested.
Re: Hashs of hash (multiple value) key problem.
by apl (Monsignor) on Dec 24, 2008 at 13:34 UTC
    Given what others have said ... If you work at a University, contact the Computer Science department; doubtless they'll have a student who will be happy to work on what you need in return for credit, or Work/Study pay, or whatever.
Re: Hashs of hash (multiple value) key problem.
by dragonchild (Archbishop) on Dec 24, 2008 at 18:43 UTC
    Before doing any more coding, can you write out in your native language exactly what it is you expect this program to do?

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?