Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Hash will not print if a value begins with 2 under certain conditions

by pimperator (Acolyte)
on Jul 27, 2014 at 00:31 UTC ( [id://1095188]=perlquestion: print w/replies, xml ) Need Help??

pimperator has asked for the wisdom of the Perl Monks concerning the following question:

This is really frustrating me. The script I'm writing is indexing coordinates in a hash and then using those index numbers to pull out values from an array.

The weird thing is that if the value begins with 2 or 22 it will not print. Any other number works. I'll show you two variations and output of the script.

First variation. This is what I want the script to do. Print chromosome, position, value.

#!/usr/bin/perl use strict; use warnings; use File::Find; use Scalar::Util qw(looks_like_number); open IN, "/home/big/scratch/affy_map.txt" or die "Cannot open referenc +e\n"; my %ref; my $head = <IN>; my $index = 0; while(<IN>){ chomp $_; my @row = split /\t/, $_; my $value = join "\t", $row[1],$row[2]; if($row[1] == 2 && $row[2] <= 50000 && $row[2] <= 51113178) { +$ref{$index}=$value; print $index."\t".$value."\n";} if($row[1] == 22 && $row[2] <= 16300001 && $row[2] <= 20500000 +) { $ref{$index}=$value; print $index."\t".$value."\n"; } $index++; } close(IN); my @files; my $masterDirect = "/nfs/archive02/big/Norm/norm_gcc/"; find(\&file_names, $masterDirect); sub file_names { if( -f && $File::Find::name=~/\.nzd$/) { push @files, $File::Find::name; } } my $count=0; foreach(@files){ $count++; if($count % 100 == 0 ){ print "\n","-" x 10, " $count ", "-" x 10, +"\n";} undef my @probes; open IN, $_; #file name handling my @inDir = split "\/", $_; my $id = pop(@inDir); $id =~ s/\.gcc.nzd$//; #header test $head =<IN>; if(looks_like_number($head)) { push @probes, $head; } #open output open OUT, ">/home/big/scratch/phase1_affy/".$id."_select_probeset. +txt"; #load probe array @probes = <IN>; close(IN); foreach my $key (sort keys %ref){ #intended function print OUT $ref{$key}."\t".$probes[$key]; #testing my @temp = split "\t", $ref{$key}; foreach(@temp){if($temp[0] == 2){print $key."\t".$ref{$key}."\ +t".$probes[$key];}} } close(OUT); }

Here's the output for the test. The printing from the reference file is flawless. The first number is the $key or index number. The second is frome $probes$key why is the $ref{$key} missing?

146529 0.777314368326637 146529 0.777314368326637 146530 0.116241153901913 146530 0.116241153901913 146531 0.940593233609167 146531 0.940593233609167 146532 -0.150051720847835 146532 -0.150051720847835 146533 0.037500790454267 146533 0.03750079045426

Variation 2.

... foreach my $key (sort keys %ref){ print OUT $ref{$key}."\t".$probes[$key]; my @temp = split "\t", $ref{$key}; foreach(@temp){if($temp[0] == 2){print $key."\t".$ref{$key}."\ +n";}} }

And its output. See now it's printing correctly. $key and $ref{$key}

146542 2 31852 146542 2 31852 146543 2 37693 146543 2 37693 146544 2 40415 146544 2 40415 146545 2 40814 146545 2 40814 146546 2 41256 146546 2 41256 146547 2 43652 146547 2 43652

I'm not as experienced as a lot of you monks out there but I've never experienced this error. I thought it might be a DOS->UNIX file problem but I performed perl -pi -e 's/\R/\n/g' input_files.txt for all the input the script sees. It prints the same value twice because there are two elements in the @temp array. I'm really at a loss right now.

Replies are listed 'Best First'.
Re: Hash will not print if a value begins with 2 under certain conditions
by Anonymous Monk on Jul 27, 2014 at 04:00 UTC

    too much code , too many file system operations

    Where is the problem?

    Is the problem with the data structure , or the subroutine doing the printing?

    When you Data::Dump::dd()umper up the data structure and read it with your eyes, is everything there?

    If you really want more help, you should whittle down your program to this

    subroutines save the day, its very easy to debug subroutines, you don't need files on the harddisk and other stuff we don't have (and don't really want :)

    Here is how you might start writing refLogic and fillRef

    Eliminate file system operations for files we don't have, include three lines files that replicate the problem, include them as strings

    update: finally I read the whole thing you posted, just dumper up two variables and we can start debugging these two loops

    my %ref = ... ; ## Data::Dump::dd(%ref) output here my @probes = ... ; ## Data::Dump::dd(@probes) output here ## foreach my $key ( sort keys %ref ) { #intended function print OUT $ref{$key} . "\t" . $probes[$key]; #testing my @temp = split "\t", $ref{$key}; foreach (@temp) { if ( $temp[0] == 2 ) { print $key. "\t" . $ref{$key} . "\t" . $probes[$key]; } } } ## foreach my $key ( sort keys %ref ) { print OUT $ref{$key} . "\t" . $probes[$key]; my @temp = split "\t", $ref{$key}; foreach (@temp) { if ( $temp[0] == 2 ) { print $key. "\t" . $ref{$key} . "\n +"; } } }

    update: now that I've looked at these two loops, well, the inner foreach is completely unneeded .... and aside from the different things you print (one has probles the other doesn't), they're pretty much the same thing

    Hope this help, if it doesn't, well, take a break, post short code, we'll figure it out

Re: Hash will not print if a value begins with 2 under certain conditions
by frozenwithjoy (Priest) on Jul 27, 2014 at 04:44 UTC

    It may not be related to your problem, but do you really mean for your if conditionals to be the following?

    if($row[1] == 2 && $row[2] <= 50000 && $row[2] <= 51113178) {...}
    if($row[1] == 22 && $row[2] <= 16300001 && $row[2] <= 20500000) {...}

    Did you mean for the second comparison in each to actually be $row[2] >= ####?

Re: Hash will not print if a value begins with 2 under certain conditions
by Athanasius (Archbishop) on Jul 27, 2014 at 04:29 UTC

    Just a supplement to the advice given by Anonymous Monk:

    Data::Dumper provides a Useqq configuration variable which, when set, causes Dumper to print any newline, tab, or carriage return characters as \n, \t, and \r, respectively. This makes it easier to see exactly what is present in each variable. Here is how I would deploy this feature to begin the debugging process:

    ... use Data::Dumper; $Data::Dumper::Indent = 0; $Data::Dumper::Terse = 1; $Data::Dumper::Useqq = 1; ... foreach my $key (sort keys %ref) { ... foreach(@temp) { if ($temp[0] == 2) { printf "\$key = %s\n \$ref{%s} = %s\n \$probes[%s] = %s\ +n", Dumper($key), Dumper($key), Dumper($ref{$key}), Dumper($key), Dumper($probes[$key]); print $key."\t".$ref{$key}."\t".$probes[$key]; } } }

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      I know that this is somewhat off-topic. Thanks for motivating me to read the documentation of Data::Dumper. Those configuration variables resolved a long standing problem.
      Bill

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1095188]
Approved by farang
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-05-29 21:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found