Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Do you know where your variables are?
 
PerlMonks  

Extraction help.

by invaderzard (Acolyte)
on Sep 02, 2012 at 14:36 UTC ( #991278=perlquestion: print w/ replies, xml ) Need Help??
invaderzard has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys, this is similar to a previous post i made, but a completely different question, so please offer me your guidance. Thanks!

This is the code I am currently working on now.

#!/usr/bin/perl use Modern::Perl; use File::Slurp qw/read_file write_file/; my $uniprot = 'D:ARP\\Downloads\\uniprot-ACN'; my $activin = 'D:ARP\\Downloads\\Activator-Pfam.txt'; my $antioxin = 'D:ARP\\Downloads\\AntiOxidant-PFAM.txt'; my $toxinin= 'D:ARP\\Downloads\\Toxin-PFAM.txt'; my $activout = 'D:ARP\\Downloads\\ActivACNPF.txt'; my $antioxout= 'D:ARP\\Downloads\\AntioxACNPF.txt'; my $toxinout= 'D:ARP\\Downloads\\ToxinACNPF.txt'; my @activline; my @antioxline; my @toxinline; my %activ = map {s/\.\d+//g; /(.+)\s+\|\s+(.+)/ and $1 => $2 } grep / +\|\s+\S+/, read_file $activin; my %antiox = map { s/\.\d+//g; /(.+)\s+\|\s+(.+)/ and $1=>$2; } grep/\ +|\s+\S+/,read_file $antioxin; my %toxin = map { s/\.\d+//g; /(.+)\s+\|\s+(.+)/ and $1=>$2; } grep/\ +|\s+\S+/,read_file $toxinin; for ( read_file $uniprot ) { next unless /(.{6})\s+.+=([^\s]+)/; push @activline, "$1 | $2 | $activ{$1} \n" if $activ{$1}; push @antioxline, "$1 | $2 | $antiox{$1} \n" if $antiox{$1}; push @toxinline, "$1 | $2 | $toxin{$1} \n" if $toxin{$1}; } print @activline; write_file $activout, @activline; write_file $antioxout, @antioxline; write_file $toxinout, @toxinline;

This is a sample of D:ARP\\Downloads\\ActivACNPF.txt', 'D:ARP\\Downloads\\AntioxACNPF.txt' and 'D:ARP\\Downloads\\ToxinACNPF.txt' They have the same format.

Q6GZX4 | PF04947.9 Q96355 | PF01486.12 PF00319.13 Q96356 | PF01486.12 PF00319.13 Q39371 | PF01486.12 PF00319.13 Q84BZ4 | PF12833.2 Q6W4T3 | PF00501.23 PF00668.15 PF00550.20 B4YPW6 | PF01486.12 PF00319.13 Q8GTF5 | PF01486.12 PF00319.13

What I want to do is to use the 6 characters to the left side of the database (eg. Q96355), and use them in the database below to find the gene name (eg. Name=smf1; ORFNames=SPBC3E7.14, SPBC4F6.01)

O59734 | Name=smf1; ORFNames=SPBC3E7.14, SPBC4F6.01 Q97W02 | Name=dbh; Synonyms=dpo4; OrderedLocusNames=SSO2448 B0JTM2 | Name=trpC; OrderedLocusNames=MAE_45030 Q0WVE9; Q5XF02; Q9ZVN7 | OrderedLocusNames=At1g05030; ORFNames=T7A14.1 +0 Q15X31 | Name=rraB; OrderedLocusNames=Patl_1031 Q66640 | Name=36 Q9F2S0 | Name=hemL; OrderedLocusNames=SCO4469; ORFNames=SCD65.12 A9R5H1 | Name=dctA; OrderedLocusNames=YpAngola_A4067 Q7N3W0 | Name=rnt; OrderedLocusNames=plu2603 Q6GNW0 | Name=lrrfip2 Q4L4T4 | OrderedLocusNames=SH2032 B7I359 | Name=rplL; OrderedLocusNames=AB57_0368 B2HII2 | Name=leuD; OrderedLocusNames=MMAR_1727

However, when I try my code, I don't get all of the data I need. (I think perhaps it's a problem with the regex?) So I will need some assistance from you almighty ones on this matter. Thanks!

Comment on Extraction help.
Select or Download Code
Re: Extraction help.
by CountZero (Chancellor) on Sep 02, 2012 at 16:04 UTC
    You can keep on struggling and writing ad hoc scripts, but what you should really do is transform your uniprot-ACN file into a real database (SQLite springs to mind) and then use the powers of DBI and SQL to do all your searching.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Extraction help.
by Anonymous Monk on Sep 02, 2012 at 16:17 UTC

    However, when I try my code, I don't get all of the data I need. (I think perhaps it's a problem with the regex?) So I will need some assistance from you almighty ones on this matter. Thanks

    Like, what do you get, and like, what is missing?

      I'm missing quite a few lines. The resultant file that was supposed to be 500+kb for 1 file has become 3kb.

      I'm posting here in the hopes that someone can debug this for me and spot any mistakes in my code.

Re: Extraction help.
by philiprbrenan (Monk) on Sep 02, 2012 at 20:12 UTC
    use feature ":5.14"; use warnings FATAL => qw(all); use strict; use Data::Dump qw(dump pp); my @d = split /\n/, <<'END'; O59734 | Name=smf1; ORFNames=SPBC3E7.14, SPBC4F6.01 Q97W02 | Name=dbh; Synonyms=dpo4; OrderedLocusNames=SSO2448 B0JTM2 | Name=trpC; OrderedLocusNames=MAE_45030 Q0WVE9; Q5XF02; Q9ZVN7 | OrderedLocusNames=At1g05030; ORFNames=T7A14.1 +0 Q15X31 | Name=rraB; OrderedLocusNames=Patl_1031 Q66640 | Name=36 Q9F2S0 | Name=hemL; OrderedLocusNames=SCO4469; ORFNames=SCD65.12 A9R5H1 | Name=dctA; OrderedLocusNames=YpAngola_A4067 Q7N3W0 | Name=rnt; OrderedLocusNames=plu2603 Q6GNW0 | Name=lrrfip2 Q4L4T4 | OrderedLocusNames=SH2032 B7I359 | Name=rplL; OrderedLocusNames=AB57_0368 B2HII2 | Name=leuD; OrderedLocusNames=MMAR_1727 END my $D; for(@d) {my ($c, $d) = split /\s+\|\s+/; $D->{$_} = $d for split /;\s+/, $c; } say $D->{Q9ZVN7}; say $D->{B2HII2};

    Produces

    OrderedLocusNames=At1g05030; ORFNames=T7A14.10 Name=leuD; OrderedLocusNames=MMAR_1727

      Taking things one step further:

      use feature ":5.14"; use warnings FATAL => qw(all); use strict; use Data::Dump qw(dump pp); my @d = split /\n/, <<'END'; O59734 | Name=smf1; ORFNames=SPBC3E7.14, SPBC4F6.01 Q97W02 | Name=dbh; Synonyms=dpo4; OrderedLocusNames=SSO2448 B0JTM2 | Name=trpC; OrderedLocusNames=MAE_45030 Q0WVE9; Q5XF02; Q9ZVN7 | OrderedLocusNames=At1g05030; ORFNames=T7A14.1 +0 Q15X31 | Name=rraB; OrderedLocusNames=Patl_1031 Q66640 | Name=36 Q9F2S0 | Name=hemL; OrderedLocusNames=SCO4469; ORFNames=SCD65.12 A9R5H1 | Name=dctA; OrderedLocusNames=YpAngola_A4067 Q7N3W0 | Name=rnt; OrderedLocusNames=plu2603 Q6GNW0 | Name=lrrfip2 Q4L4T4 | OrderedLocusNames=SH2032 B7I359 | Name=rplL; OrderedLocusNames=AB57_0368 B2HII2 | Name=leuD; OrderedLocusNames=MMAR_1727 END my $D; for(@d) {my ($c, $d) = split /\s+\|\s+/; for(split /;\s+/, $d) {my ($k, $v) = split /=/; $D->{$_}{$k} = $v for split /;\s+/, $c; } } say $D->{Q9ZVN7}{ORFNames}; say $D->{B2HII2}{Name};

      Produces

      T7A14.10 leuD
Re: Extraction help.
by philiprbrenan (Monk) on Sep 02, 2012 at 22:45 UTC

    Apologies in advance for the multiple posts on this subject, but I discovered that I wanted to illustrate what is, for me, one of Perl's most compelling features: the transmogrification of Text into Objects.

    use feature ":5.14"; use warnings FATAL => qw(all); use strict; use Data::Dump qw(dump pp); my @d = split /\n/, <<'END'; O59734 | Name=smf1; ORFNames=SPBC3E7.14, SPBC4F6.01 Q97W02 | Name=dbh; Synonyms=dpo4; OrderedLocusNames=SSO2448 B0JTM2 | Name=trpC; OrderedLocusNames=MAE_45030 Q0WVE9; Q5XF02; Q9ZVN7 | OrderedLocusNames=At1g05030; ORFNames=T7A14.1 +0 Q15X31 | Name=rraB; OrderedLocusNames=Patl_1031 Q66640 | Name=36 Q9F2S0 | Name=hemL; OrderedLocusNames=SCO4469; ORFNames=SCD65.12 A9R5H1 | Name=dctA; OrderedLocusNames=YpAngola_A4067 Q7N3W0 | Name=rnt; OrderedLocusNames=plu2603 Q6GNW0 | Name=lrrfip2 Q4L4T4 | OrderedLocusNames=SH2032 B7I359 | Name=rplL; OrderedLocusNames=AB57_0368 B2HII2 | Name=leuD; OrderedLocusNames=MMAR_1727 END sub Codes(@) # Database {package Codes; our $D = bless {}; for(@_) # Parse data {my ($c, $d) = split /\s+\|\s+/; for(split /;\s+/, $d) {my ($k, $v) = split /=/; $D->{$_}{$k} = $v for split /;\s+/, $c; } } sub find($$) # Find field {my ($field, $code) = @_; $D->{$code} ? $D->{$code}{$field} : undef } eval 'sub '.$_.'{find("'.$_.'", $_[1])}' # Accessor functions for keys %{{map({($_, 1)} map({keys %$_} values %$D))}}; $D } my $D = Codes(@d); say $D->ORFNames('Q9ZVN7'); say $D->Name('B2HII2'); say $D->OrderedLocusNames('B2HII2');

    Produces

    T7A14.10 leuD MMAR_1727

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://991278]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2014-04-21 05:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (490 votes), past polls