Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: motif finding

by quester (Vicar)
on Jan 31, 2012 at 07:58 UTC ( #950903=note: print w/replies, xml ) Need Help??

in reply to motif finding

A more "Perl-ish" way of doing the same thing...

use strict; use warnings; use Term::ANSIColor; use autodie; #Program to find motif site in a given protein sequence using files my $motif = "AGGGGG"; open( my $read, "<dna.txt" ); my @e = <$read>; $_ = join( " ", @e ); s/\s+//g; my @c; push @c, pos( ) - length( $motif ) + 1 while /$motif/g; s/$motif/color( 'bold green' ) . $motif . color( 'black' )/eg; print $_, "\n"; print "Number of sites the motif (AGGGGG) is present: ", scalar @c, "\ +n"; print "And the positions in the string are: ", join( ',', @c ), "\n\n" +;
The eliminates counting characters one at a time, as in the $i loop in the original, in favor of using pattern matching on the entire character string. I have found that eliminating loop counters wherever possible greatly reduces the number of bugs in my code.

Replies are listed 'Best First'.
Re^2: motif finding
by educated_foo (Vicar) on Jan 31, 2012 at 13:55 UTC
    Or, even more Perl-ish, with a bit less extra work (e.g. only one //g loop):
    use Term::ANSIColor; open(READ,"<dna.txt"); $m = 'AGGGGG'; $_ = do { local $/; <READ> }; # read whole file s/\s+//g; # remove blanks s{$m}{ # search the string push @c, 1 - length($m) + pos; # remember position color('bold green').$m.color('reset'); # remember to reset! }eg; print "$_\n"; # print transformed string print "NUMBER OF SITES THE MOTIF ($m) IS PRESENT: ".@c."\n"; print "AND THE POSITION IN THE STRING IS:", join(',', @c), "\n\n";
      my motif input is a file, how i can modified the program to make it work?
Re^2: motif finding
by RichardK (Parson) on Jan 31, 2012 at 14:19 UTC

    I think using File::Slurp is even easier and more perl-ish :)

    use File::Slurp; # read file as a string my $text = read_file('dna.txt'); # now remove whitespace including line breaks $text =~ s/\s+//g; # stuff
    (update : removed a stray space)

      Thank you very much for the reply. In the code that i have used i am giving the input (the motif sequence). considering entire genome as a single string if i want the most repeated elements of say 20 base pairs in the entire string how can i find it?

        I'm not sure what you're looking for, can you explain with a simple example?

        Are you looking for repeats of given string or something more complex?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://950903]
[Discipulus]: anyway poppins probably died with the last night cold. But is not normal to see them in dec. they must pop in April
NodeReaper eyes the thorns in the side
Discipulus : the party puller!
[Discipulus]: I was trying to solve this but i'm not. my regex-fu is stuck at primary school

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2017-12-18 21:39 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (499 votes). Check out past polls.