Dumb problem with dumper.

oxydeepu has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I am here with a new query. Hope to get a quick reply.
I am try to search for the number of occurrence of 10 hexamer word in a big file of fasta sequence.
I want to get the number of occurrence of each hexamer at each position till 5000, which is the longest sequence.
the code i wrote is given below

#!/usr/bin/perl

use Data::Dumper;

open $f,$ARGV[0];

while(<$f>)
{
        chomp;
        $id = $_; #eat up the header line
        chomp($s = <$f>);
        if($s !~ /^>/)
        {
            push @seq,$s;    
        }
}

open $fa, "top_10_hexamers.txt";

while(<$fa>)
{
        chomp;
        $hx{$_}++; #hash of 10 hexamer
}

@words = sort keys %hx;

for($i = 1;$i < 5000;$i++)
{
        for($j = 0;$j <= $#words;$j++)
        {
                $result[$i][$j] = 0; # array of 5000 columns of positi
+on and words as columns 
        }
}

foreach $x(@seq)
{
                for($j = 0; $j < (length($x) - 6);$j++)
                {
                        $wrd = substr $x,$j,6;  #getting the hexamer c
+ombinations
                        foreach $w(@words)
                        {
                                if($w eq $wrd)  # comparing with the w
+ord
                                {
                                        $result->{$j}->{$wrd}++; # try
+ing to get position, word and frequency
                                }
                        }
                }
                $wrd = "";
}
print  Dumper $result;
[download]

It worked for me when my sequences where 50 bases long. I used to get a hash named $VAR1. But Now I am working with 5000, and it prints $VAR1 = undef;
Can any one help me with this problem.
Thank you in advance,

Best regards,
Deepak

Comment on Dumb problem with dumper. Download Code

Replies are listed 'Best First'.
Re: Dumb problem with dumper. by toolic (Bishop) on Nov 26, 2012 at 19:14 UTC
Tip #2 from the Basic debugging checklist: start printing array and hash values as soon as you populate them: `print Dumper(\@seq); open $fa, "top_10_hexamers.txt"; while(<$fa>) { chomp; $hx{$_}++; #hash of 10 hexamer } print Dumper(\%hx);` [download]	[reply] [d/l]
Re^2: Dumb problem with dumper. by oxydeepu (Novice) on Nov 26, 2012 at 19:37 UTC
Thank you guys. Sorry for spamming. I figured it out. It works fine. Thank you Deepak	[reply]
Re: Dumb problem with dumper. by aitap (Curate) on Nov 26, 2012 at 19:40 UTC
I am try to search for the number of occurrence of 10 hexamer word in a big file of fasta sequence. Did you try using BioPerl (http://bioperl.org)? `for($i = 1;$i < 5000;$i++) { for($j = 0;$j <= $#words;$j++) { $result[$i][$j] = 0; # array of 5000 columns of positi +on and words as columns } }` [download] If you want to initialise your array for better handling of large data, look there: Re^3: how apply large memory with perl?. By the way, this initialisation is lost because you initialize an array and work later with a hash reference. strict would never allow it. Sorry if my advice was wrong.	[reply] [d/l]
Re: Dumb problem with dumper. by tobyink (Canon) on Nov 26, 2012 at 19:20 UTC
You appear to be using the `$result` variable for two separate purposes; here as an array of arrays: `$result[$i][$j]` [download] and here as a hash of hashes `$result->{$j}->{$wrd}++` [download] Make up your mind! Short of some freaky overload magic, that's not likely to work. Meh. The first is, of course, `@result`. Either way, the rest of my advice still stands... I'd strongly recommend you `use strict;` near the top of your script. (In particular, it's the `vars` feature of strict that you want, but might as well enable the whole strict pragma.) This will force you to declare the variables you use; and thus encourage you to get them straight in your own head. `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l] [select]


go ahead... be a heretic
	PerlMonks