Re^2: split a file into records and process it

Oh, I have some of it, I was gonna get into normalization and polishing my code later... that note appears when there is more than one hit, but I can clean it up since it would be obvious that there's actually more than one hit, hence it is needless ...

Here's my so-messed-up code which I would tend to after having figured with the wiser monks a way addressing my original query..

#!/usr/local/bin/perl
use strict;
use warnings;

my %RNACounts;
my %hash;        #Gene Info
my @snoRNA;
my @exonNumbers;
my @geneID;
my @productID;
my @geneNames;
my @references;
my(@queries, @subjects);

open (FH,'<',"F:/Bioinformatics_NCBI/20MARCH_10/PERL Analysis/test.txt
+") or die("$!\n");
open(FO, '>',"F:/Bioinformatics_NCBI/20MARCH_10/PERL Analysis/testOut.
+txt") or die ("$!\n");   #TESTING
while(<FH>){
        chomp;
        if(/(?=^\d+$)/../(?=http:.*)\n/){
               # s/\W+\n+!\W+//;
               next unless /(\w+ |\| | \n+)/x;  #except for words | pi
+pes | \n
                print FO $_, "\n" ;
        }
        if(/snoRNA(\s+|\d+)[\s|-|\d]/){     #snoRNA
        my $name = $_;
        push @snoRNA, $name;
                }
         if(/^\d+$/){         #exon Numbers
                my $number = $_;
                push @exonNumbers, $number;
                }
                if(/^GI:\d+[\.\d+]/){     #gene Names
                my $name = $_;
                push @geneID , $name;
                }
        if(/^NM_\d+[\.\d+]/){                #gene product ID
                my $name = $_;
                $name =~ s/\s+$//; #substitute the trailing blanks..
                push @productID, $name;
                }
        if(/homo sapiens[\w+\W+]/i){      #gene name, Need MultiLine s
+upport..
                my $name = $_;
                push @geneNames, $name;
                }
        if(/http:.*/){                  #web refs, need multiline supp
+ort..
                my $name = $_;
                push @references, $name;
                }
       # if(/^(?=snoRNA).*(\n^|Query|Sbjct)(?=homo sapiens)/i){}

        if(/^Query(\s+)\d+\s+[agtc]/i){       #Prepare the query and s
+ubject arrays
                my $queryName = $_;                #compare, measure, 
+span, note gaps
                $queryName =~ s/$1//;
                push @queries, $queryName;
                }
        if(/^sbjct(\s+)\d+\s+[agtc]/i){
                        my $sbjctName =  $_;
                        $sbjctName =~ s/$1//;
                        push @subjects, $sbjctName;
                        }
        ##my @array = split /^\d+$/;
       # #print "@array\n";

        }

                                ####GENERATING THE HASHES#####
                                #CREATE A HASH WITH THE snoRNAs AS THE
+ keys
                                

foreach my $element (@snoRNA){
        #print "$element\n";
        my $i =0;
        $hash{$element}="VAL";    #TEST
        }





use Data::Dumper;
print Dumper(\%hash);
#print Dumper(\@exonNumbers),$/;
#print Dumper(\@geneID),$/;
#print Dumper(\@productID),$/;
#print Dumper(\@snoRNA),$/;
#print Dumper(\@geneNames),$/;
#print Dumper(\@references),$/;
#print Dumper(\@queries),$/;
#print Dumper(\@subjects),$/;
[download]

Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.

Comment on Re^2: split a file into records and process it Select or Download Code


Syntactic Confectionery Delight
	PerlMonks