Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^12: Help with pushing into a hash

by jemswira (Novice)
on Sep 02, 2012 at 07:09 UTC ( #991239=note: print w/ replies, xml ) Need Help??


in reply to Re^11: Help with pushing into a hash
in thread Help with pushing into a hash

Hey there kenosis. Sorry to bother you again. SO the program is no longer giving me error messages, but the pushing to the arrays here:

push @activline, "$1 | $2 | $activ{$1} \n" if $activ{$1}; push @antioxline, "$1 | $2 | $antiox{$1} \n" if $antiox{$1}; push @toxinline, "$1 | $2 | $toxin{$1} \n" if $toxin{$1};

isn't working properly. For example I selected a specific number from $activin, which was in $uniprot, but it wasn't in the final output. So @activline, @antioxline and @toxinline are much smaller than they should be. So a lot of the data that is supposed to be in the output is not there anymore. I tried testing with this:

if ($1 eq "E7F888"){print $2;}

and it gave me the correct $2, I tried this too:

next unless /(E7F888)\s+.+=([^\s]+)/; push @activline, "$1 | $2 | $activ{$1} \n" if $activ{$1}; print @activline;

and it printed the correct thing. But if I removed the tests i put, and ran the normal code, E7F888 is not in @activline. Is there anything I could do to figure out what's wrong with my code? Here's it again.

#!/usr/bin/perl use Modern::Perl; use File::Slurp qw/read_file write_file/; my $uniprot = 'uniprot-ACN'; my $activin = 'Activator-PFAM.txt'; my $antioxin = 'AntiOxidant-PFAM.txt'; my $toxinin= 'Toxin-PFAM.txt'; my $activout = 'ActivACNPF.txt'; my $antioxout= 'AntioxACNPF.txt'; my $toxinout= 'ToxinACNPF.txt'; my @activline; my @antioxline; my @toxinline; my %activ = map {s/\.\d+//g; /(.+)\s+\|\s+(.+)/ and $1 => $2 } grep / +\|\s+\S+/, read_file $activin; my %antiox = map { s/\.\d+//g; /(.+)\s+\|\s+(.+)/ and $1=>$2; } grep/\ +|\s+\S+/,read_file $antioxin; my %toxin = map { s/\.\d+//g; /(.+)\s+\|\s+(.+)/ and $1=>$2; } grep/\ +|\s+\S+/,read_file $toxinin; #if (exists $activ{"E7F888"}) { #print $activ{'E7F888'};} #else {print "LOL";} for ( read_file $uniprot ) { next unless /(.{6})\s+.+=([^\s]+)/; # if ($1 eq "E7F888"){print $2;} push @activline, "$1 | $2 | $activ{$1} \n" if $activ{$1}; push @antioxline, "$1 | $2 | $antiox{$1} \n" if $antiox{$1}; push @toxinline, "$1 | $2 | $toxin{$1} \n" if $toxin{$1}; } print @activline; write_file $activout, @activline; write_file $antioxout, @antioxline; write_file $toxinout, @toxinline;

THanks so much.


Comment on Re^12: Help with pushing into a hash
Select or Download Code
Re^13: Help with pushing into a hash
by Kenosis (Priest) on Sep 02, 2012 at 22:17 UTC

    Hi, jemswira!

    This is no bother at all!

    You mentioned that the following captured a key that wasn't in the final set:

    next unless /(E7F888)\s+.+=([^\s]+)/; push @activline, "$1 | $2 | $activ{$1} \n" if $activ{$1}; print @activline;

    Try the following:

    next unless /(E7F888)\s+.+=([^\s]+)/; say; exit;

    saywill print the scalar $_ which the regex operates on. By doing this, you can examine the line to verify that the matches the following regex:

    /(.{6})\s+.+=([^\s]+)/

    Also, it may be that the 'key' is not being captured from $activin, since:

    push @activline, "$1 | $2 | $activ{$1} \n" if $activ{$1};

    will work only if the 'key' if found in both files and the regexs which process both work as they should.

    Let me know what you find...

      well i ran the code and it returned this:

      E7F888 Name=arid5b ;PF01388

      which is correct. Well I selected the number cause it was in both files ( I manually ctrl-f ed the notepad file.

      Well a lot of lines are missing from the final output. The $activ is 512 kb large and the out put is 3kb. And also @activline is missing alot when I did print @activline;. I randomly selected E7F888 from $activin and checked to make sure it was in $uniprot, so it may be another offending line of data. If I did this:

      print $activ{E7F888},"\n"; for ( read_file $uniprot ) { next unless /(E7F888)\s+.+=([^\s]+)/; say; push @activline, "$1 | $2 | $activ{$1} \n" if $activ{$1}; } print @activline; } print @activline;

      I got this:

      PF01388 E7F888 Name=arid5b ;PF01388 E7F888 | arid5b | PF01388

      which showed that it was in %activ and that it was found in the file. But when I run through the normal code, it is not in @activ anymore.

      UPDATE,

      I ran the following code to check something, and not all the data is printed, I.E. A lot of lines are not pushed into the array.

      #!/usr/bin/perl use Modern::Perl; use File::Slurp qw/read_file write_file/; my $uniprot = 'uniprot-sfinal.txt'; my $activin = 'Activator-PFAM.txt'; my %activ = map { s/\.\d+//g; /(.+)\s+\|\s+(.+)/ and $1=>$2; } grep/\| +\s+\S+/,read_file $activin; for ( read_file $uniprot ) { next unless /(.{6})\s+.+=([^\s]+)/; print $1,"\n" if $activ{$1}; } print "done";

      I am sure that there are more lines that are supposed to be printed. E.G.

      $uniprot: Q801F8 Q90XZ5; Q90XZ8;Name=dmrt1 B7ZS42 A4PBN7; B7ZS44;Name=dmrt1-b ;PF00751;PF12374 Q157S1 Name=Crtc1 Synonyms=Mect1, $activ: Q801F8 | PF00751.13 PF12374.3 PF12374.3 B7ZS42 | PF00751.13 PF12374.3 PF12374.3 Q157S1 | PF12886.2 PF12885.2 PF12884.2

      I shall try to reverse the order, I.E, make a hash out of $uniprot instead of $activ and run it again.

        Perhaps your data is more complex than first thought. You listed the following:

        $uniprot Q801F8 Q90XZ5; Q90XZ8;Name=dmrt1 B7ZS42 A4PBN7; B7ZS44;Name=dmrt1-b ;PF00751;PF12374 Q157S1 Name=Crtc1 Synonyms=Mect1, $activ: Q801F8 | PF00751.13 PF12374.3 PF12374.3 B7ZS42 | PF00751.13 PF12374.3 PF12374.3 Q157S1 | PF12886.2 PF12885.2 PF12884.2

        Give the above listing, are you (potentially) expecting the following (keys/values)?

        Q801F8 -> PF00751 PF12374 PF12374 B7ZS42 -> PF00751 PF12374 PF12374 Q157S1 -> PF12886 PF12885 PF12884 ... Q801F8 -> dmrt1 PF00751 PF12374 PF12374 Q90XZ5 -> dmrt1 PF00751 PF12374 PF12374 Q90XZ8 -> dmrt1 PF00751 PF12374 PF12374 B7ZS42 -> dmrt1-b PF00751 PF12374 PF12374 A4PBN7 -> dmrt1-b PF00751 PF12374 PF12374 B7ZS44 -> dmrt1-b PF00751 PF12374 PF12374 Q157S1 -> Crtc1 PF12886 PF12885 PF12884

        It looks like there are multiple keys on a single $uniprot line:

        Q801F8 Q90XZ5; Q90XZ8;Name=dmrt1 ^^^^^^ ^^^^^^ ^^^^^^ | | | +-------+-------+--- keys to be captured?

        If this is the case, the current regex operating on the $uniprot lines would fail.

        Let me know if the script needs to capture potentially multiple keys on a single $uniprot line...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://991239]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2014-07-29 07:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls