http://www.perlmonks.org?node_id=1020745

rufessor has asked for the wisdom of the Perl Monks concerning the following question:

Hi All-

So this error is confusing me (a Perl beginner but with some experience) as well as someone I consider to be very far from a beginner and thus this post. Basically I have a tab delimited file which comes from the output of another bit of code. This is opened and read line by line in a presumably simple operation to populate a complex data structure (Hash{key}{value}->(Array)) where the key and value are fields pulled from the line and the array is the entire line stored for later use. The issue is that sometimes when I split the line on tabs for whatever reason it *misses* one of the fields leaving it as uninitialized. The very very strange thing is that I can see this in the debugger but if I execute the split command from within the debugger... its just fine. So... any help would be great.- code snippet and debugger view of this error is below- FYI perl -v tells me that... This is perl 5, version 12, subversion 3 (v5.12.3) built for darwin-thread-multi-2level (with 2 registered patches, see perl -V for more detail)

my $count = 0; 44 my $priorScaffold; 45 my $priorChr; 46 my @finalArray; 47 open (IN, '<', $fileName) or die "Cannot open $fileName\n"; 48 LINE: while (my $line = <IN>){ 49 chomp $line; 50 if ($line =~ m/^>\w+/){ 51 next LINE; 52 } 53 my @dataLine = split /\t/, $line; 54 my $chr = $dataLine[5]; 55 next LINE if ($chr !~ /^\d{1,2}?$/); 56 my $scaffold = $dataLine[0]; 57 if ($count != 0){ 58 if ($scaffold ne $priorScaffold){ + #If we found a new scaffoldd 59 push @{$global{$priorChr}{$priorScaffold}}, @final +Array; #write to hash @finalArray using prior accumulated +data for last scaffold 60 $count = 0 ; 61 $allScaffolds{$priorScaffold}=0; + #Save priorScaffold in allScaffolds Hash for list u +se later- value is meaningless 62 @finalArray = (); 63 } 64 elsif ($chr != $priorChr){ + #If we just switched Chromosomes but are still o +n the same scaffold 65 push @{$global{$priorChr}{$priorScaffold}}, @final +Array; #write to global the @finalArray accumulated to the + point it switches chromosomes 66 $count = 0; 67 @finalArray = (); 68 } 69 } 70 push @{$finalArray[$count]}, @dataLine; 71 $priorScaffold = $scaffold; 72 $priorChr = $chr; 73 $count ++; 74 print "$line\n"; 75 print join ("\t", @dataLine)."\n"; 76 }

and the debugger transcript showing this very odd (to me) behavior is below.... NOTE that when I command line execute the split on the line it works perfectly.. but somehow the actual executed code is failing

DB<1> b 54 $dataLine[2] = uninitialized + + DB<2> c + + This program requires 4 command line arguments Input File from as MizBee output from SyntenyFinder as first Input File listing all scaffolds and total lengths from the query geno +me Output FILE name to be used as basis for creation of MizBee input file + and tmp file you can deleteA NUMERIC integer value specifying the cu +t off value for scaffold EXCLUSION from visualization This value is used to eliminate scaffolds with less then VAL hits on a +ny given chromosome THIS ONLY ELIMINATES THE SCAFFOLD VISUALIZATION FOR THAT CHROMOSOME main::syntenyLoad(MizBEE_parseOUT.pl:54): 54: my $chr = $dataLine[5]; DB<2> x $line + + 0 "GL429767\cI42604\cI226589\cI0\cI1\cI7\cI100487615\cI100493753\cI55 +0\cI-1\cI0.8748\cIENSMLUG00000029214\cIENSG00000087085\cI" DB<3> x split /\t/, $line + + 0 'GL429767' 1 42604 2 226589 3 0 4 1 5 7 6 100487615 7 100493753 8 550 9 '-1' 10 0.8748 11 'ENSMLUG00000029214' 12 'ENSG00000087085' DB<4> x @dataLine + + 0 'GL429767' 1 42604 2 'uninitialized' 3 0 4 1 5 7 6 100487615 7 100493753 8 550 9 '-1' 10 0.8748 11 'ENSMLUG00000029214' 12 'ENSG00000087085' DB<5>

So I went ahead and wrote a test script doing only the split and using the same file and print out both the line and the split line and that script works just fine... in that the print statement shows the line and the split line to be identical over all (well... I admit I didn't look at all 80,000 lines... but I cannot find any errors in that so the input file is OK... its just somehow this code thats causing issues... but its so simple! HAHAH... always simple that kills....

Update: Hold.... working on this for a minute... had an idea to test before bothering everyone.