Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^13: Help with pushing into a hash

by Kenosis (Priest)
on Sep 02, 2012 at 22:17 UTC ( #991332=note: print w/ replies, xml ) Need Help??


in reply to Re^12: Help with pushing into a hash
in thread Help with pushing into a hash

Hi, jemswira!

This is no bother at all!

You mentioned that the following captured a key that wasn't in the final set:

next unless /(E7F888)\s+.+=([^\s]+)/; push @activline, "$1 | $2 | $activ{$1} \n" if $activ{$1}; print @activline;

Try the following:

next unless /(E7F888)\s+.+=([^\s]+)/; say; exit;

saywill print the scalar $_ which the regex operates on. By doing this, you can examine the line to verify that the matches the following regex:

/(.{6})\s+.+=([^\s]+)/

Also, it may be that the 'key' is not being captured from $activin, since:

push @activline, "$1 | $2 | $activ{$1} \n" if $activ{$1};

will work only if the 'key' if found in both files and the regexs which process both work as they should.

Let me know what you find...


Comment on Re^13: Help with pushing into a hash
Select or Download Code
Replies are listed 'Best First'.
Re^14: Help with pushing into a hash
by jemswira (Novice) on Sep 03, 2012 at 06:27 UTC

    well i ran the code and it returned this:

    E7F888 Name=arid5b ;PF01388

    which is correct. Well I selected the number cause it was in both files ( I manually ctrl-f ed the notepad file.

    Well a lot of lines are missing from the final output. The $activ is 512 kb large and the out put is 3kb. And also @activline is missing alot when I did print @activline;. I randomly selected E7F888 from $activin and checked to make sure it was in $uniprot, so it may be another offending line of data. If I did this:

    print $activ{E7F888},"\n"; for ( read_file $uniprot ) { next unless /(E7F888)\s+.+=([^\s]+)/; say; push @activline, "$1 | $2 | $activ{$1} \n" if $activ{$1}; } print @activline; } print @activline;

    I got this:

    PF01388 E7F888 Name=arid5b ;PF01388 E7F888 | arid5b | PF01388

    which showed that it was in %activ and that it was found in the file. But when I run through the normal code, it is not in @activ anymore.

    UPDATE,

    I ran the following code to check something, and not all the data is printed, I.E. A lot of lines are not pushed into the array.

    #!/usr/bin/perl use Modern::Perl; use File::Slurp qw/read_file write_file/; my $uniprot = 'uniprot-sfinal.txt'; my $activin = 'Activator-PFAM.txt'; my %activ = map { s/\.\d+//g; /(.+)\s+\|\s+(.+)/ and $1=>$2; } grep/\| +\s+\S+/,read_file $activin; for ( read_file $uniprot ) { next unless /(.{6})\s+.+=([^\s]+)/; print $1,"\n" if $activ{$1}; } print "done";

    I am sure that there are more lines that are supposed to be printed. E.G.

    $uniprot: Q801F8 Q90XZ5; Q90XZ8;Name=dmrt1 B7ZS42 A4PBN7; B7ZS44;Name=dmrt1-b ;PF00751;PF12374 Q157S1 Name=Crtc1 Synonyms=Mect1, $activ: Q801F8 | PF00751.13 PF12374.3 PF12374.3 B7ZS42 | PF00751.13 PF12374.3 PF12374.3 Q157S1 | PF12886.2 PF12885.2 PF12884.2

    I shall try to reverse the order, I.E, make a hash out of $uniprot instead of $activ and run it again.

      Perhaps your data is more complex than first thought. You listed the following:

      $uniprot Q801F8 Q90XZ5; Q90XZ8;Name=dmrt1 B7ZS42 A4PBN7; B7ZS44;Name=dmrt1-b ;PF00751;PF12374 Q157S1 Name=Crtc1 Synonyms=Mect1, $activ: Q801F8 | PF00751.13 PF12374.3 PF12374.3 B7ZS42 | PF00751.13 PF12374.3 PF12374.3 Q157S1 | PF12886.2 PF12885.2 PF12884.2

      Give the above listing, are you (potentially) expecting the following (keys/values)?

      Q801F8 -> PF00751 PF12374 PF12374 B7ZS42 -> PF00751 PF12374 PF12374 Q157S1 -> PF12886 PF12885 PF12884 ... Q801F8 -> dmrt1 PF00751 PF12374 PF12374 Q90XZ5 -> dmrt1 PF00751 PF12374 PF12374 Q90XZ8 -> dmrt1 PF00751 PF12374 PF12374 B7ZS42 -> dmrt1-b PF00751 PF12374 PF12374 A4PBN7 -> dmrt1-b PF00751 PF12374 PF12374 B7ZS44 -> dmrt1-b PF00751 PF12374 PF12374 Q157S1 -> Crtc1 PF12886 PF12885 PF12884

      It looks like there are multiple keys on a single $uniprot line:

      Q801F8 Q90XZ5; Q90XZ8;Name=dmrt1 ^^^^^^ ^^^^^^ ^^^^^^ | | | +-------+-------+--- keys to be captured?

      If this is the case, the current regex operating on the $uniprot lines would fail.

      Let me know if the script needs to capture potentially multiple keys on a single $uniprot line...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://991332]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (18)
As of 2015-07-30 18:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (273 votes), past polls