Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
There's more than one way to do things
 
PerlMonks  

pattern matching and array comparison

by Anonymous Monk
on Jun 13, 2005 at 18:54 UTC ( #466253=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

dear monks,

i have a very simple problem but can't work it out! I basically have two arrays, one contains a small number of unique id's, the second contains lots of sequences labelled with their id's.

I am simply trying to compare the unique sequence id's to those in the sequence file and extract the corresponding sequence.

I am getting confused trying to loop within two arrays. Please can someone show me where i'm getting it wrong??

# the uniq id's file looks like this: # gi|11995001:156374-156649 dbj|BA000040|:2701685-2702539 dbj|BA000040|:c8987046-8986282 gi|13488050:58289-58570 gi|13470324:5721573-5721854 # the corresponding sequence file looks like this: >gi|11995001:156374-156649, SMa0002 ATGGAGGCTGTTCCCATGAATGTAGACCTCTCACGGCGCAGCTTTTTGAAGCTGGCTGGAGCAGGGGCTG CGGCAACGTCACTCGGTGCGATGGGGTTTGGTGAGGCTGAGGCGGCGGTCGTCGCGCATGTCCGGCCTCA >dbj|BA000040|:2701685-2702539 GAAGGAGCCGATCTGGTCACCTTTTCCGGCGACAAGCTGCTGGGCGGTCCGCAGGCGGGTTTCATCGTCG GGCGCAGGGACCTGATCGCCGA # every unique_id has a corresonding sequence in @sequence # here is my attempt open (GENES, "$ARGV[1]") or die "unable to open file $!\n"; open (IDS, "$ARGV[0]") or die "unable to open file $!\n"; open (GENES, "$ARGV[1]") or die "unable to open file $!\n"; my @ids = <IDS>; my @genes = <GENES>; my $ids = join ('', @ids); @ids = split ('\n', $ids); my $genes = join ('', @genes); @genes = split ('>', $genes); my @accessions; foreach my $line (@file) { if ($line =~ /^(\w+\|\w+\.{0,1}\d{0,1}\|{0,1}:c{0,1}\d+\-\d+)/ +) { push @accessions, "$1"; } } # extract uniq id's my %seen=(); my @uniq = (); foreach my $item (@accessions) { unless ($seen{$item}) { $seen{$item}=1; push (@uniq, $item); } } # dig out the correspnding sequence for each id # THIS BIT NOT WORKING ;-( for (my $i=0; $i<@sequence; $i++) { foreach my $id (@uniq) { if ($sequence[$i] =~ /^$id/) { print "$id\n"; } } }

Comment on pattern matching and array comparison
Download Code
Re: pattern matching and array comparison
by Anonymous Monk on Jun 13, 2005 at 18:57 UTC
    sorry - made a typo: should be
    foreach my $line (@ids) { if ($line =~ /^(\w+\|\w+\.{0,1}\d{0,1}\|{0,1}:c{0,1}\d+\-\d+)/ +) { push @accessions, "$1"; }
    }
Re: pattern matching and array comparison
by reneeb (Chaplain) on Jun 13, 2005 at 20:18 UTC
    #! /usr/bin/perl use strict; use warnings; my $id_file = '/path/to/file/with/ids.txt'; my $fasta_file = '/path/of/fasta/file.fasta'; open(my $fh, "<$id_file") or die $!; my @ids = <$fh>; close $fh; { local $/ = "\n>"; open(my $fh, "<$fasta_file") or die $!; while(my $entry = <$fh>){ print $entry if(grep{$entry =~ /\Q$_\E/}@ids); } }
Re: pattern matching and array comparison
by Paladin (Priest) on Jun 13, 2005 at 20:21 UTC
    Well, first off, you should probably always use strict and warnings. use strict would have found your error for you. Well, 1 of the errors.
    1. In your final for() loop, you are using an array called @sequence which you never initialize anywhere else. I presume you meant to use @gene there instead.
    2. Your $ids have regex meta chars in them (the |), so you need to tell Perl to treat them as normal characters.
    If you change your final part to:
    for (my $i=0; $i<@genes; $i++) { foreach my $id (@uniq) { if ($genes[$i] =~ /^\Q$id/) { print "$id\n"; } } }
    it seems to work.
Re: pattern matching and array comparison
by GrandFather (Cardinal) on Jun 13, 2005 at 22:28 UTC

    Using a hash is probably a better way to do it. Something like this may be what you want:


    Perl is Huffman encoded by design.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://466253]
Approved by moot
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2014-04-20 13:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls