Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Analyse an array and join only some elements of it

by Istirion (Initiate)
on Jun 15, 2012 at 12:45 UTC ( #976425=perlquestion: print w/replies, xml ) Need Help??
Istirion has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks,

I have a programming problem which I can't solve on my own and therefor Id like to ask for your help.

Earlier in the programm I read a complete inputfile via <STDIN>, process and split it in pieces and get something like a really big array of strings. Each string is a part of DNA, so it looks like "ATCGCAT..." (very long!). I can join the complete array into one string for further analysis of the DNA. But I also would like to do this:

1) Check every item of the array, starting with [0] if it passes an easy test (here: I just want to see, if the string of bases can be divided by 3 and therefor has the correct number for further analysis and transformation into aminoacids)

2) Join all strings in the array which passed the test into one big string (>6 mio. characters), letting out those which didn't pass (wrong number of bases).

3) Send some info to a logfile, which says something like this "Found x blocks (x = number of elemens in the array = $#array), joined y of them and left out z." The leftout parts should follow, like

Not used blocks# 1200 1500 5000 ...

I think it could be really easily solved, but right now I don't see the solution... :-(

Thank you very much for your help!
  • Comment on Analyse an array and join only some elements of it

Replies are listed 'Best First'.
Re: Analyse an array and join only some elements of it
by moritz (Cardinal) on Jun 15, 2012 at 13:11 UTC
    my @dna_chunks = ...; sub your_test { my $string = shift; # do your test here, and return 1 if it passes } my @used = grep your_test($_), @dna_chunks; my $long_string = join '', @used; print "Used ", scalar(@used), " chunks, ",@dna_chunks - @used, " disca +red";

    See grep for details.

      The combination of join and grep sounds like it will do what you want. (map and sort and some of their friends from List::Util and List::MoreUtils are also often useful, but don't sound needed in your case.)

      The following example uses join and grep to filter the numbers from 1 to 20 for primes, then join the primes together into a string separated by semicolons.

      sub is_prime { my $num = shift; return if $num == 1; # 1 is not prime die "usage: is_prime(NATURAL NUMBER)" unless $num =~ /^[1-9][0-9]*$/; for my $div (2 .. sqrt $num) { return if $num % $div == 0; } return 1; } my @numbers = 1 .. 20; print join ";", grep { is_prime($_) } @numbers;
      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Analyse an array and join only some elements of it
by jwkrahn (Monsignor) on Jun 15, 2012 at 14:30 UTC

    It sounds like you need something like this:

    my ( $fasta, $valid, $invalid ); while ( <STDIN> ) { tr/ATCGatcg//cd; next unless /[ATCGatcg]/; if ( length % 3 ) { $invalid++; next; } $valid++; $fasta .= $_; } print LOGFILE "Found ", $valid + $invalid, " blocks, joined $valid of +them and left out $invalid.\n";
Re: Analyse an array and join only some elements of it
by AnomalousMonk (Canon) on Jun 15, 2012 at 17:56 UTC
    >perl -wMstrict -le "use List::MoreUtils qw(part); ;; my @strings = qw(X XY AAA WXYZ VWXYZ BBBCCC TUVWXYZ); ;; use constant { X3 => 0, NX3 => 1, }; my @indices = part { length($strings[$_]) % 3 ? NX3 : X3 } 0 .. $#strings; ;; print qq{indices of strings whose lengths are...}; print qq{ even multiples of 3: @{$indices[X3]}}; print qq{ non-even multiples of 3: @{$indices[NX3]}}; ;; my $aminos = join '', @strings[ @{$indices[X3]} ]; print qq{amino acids: '$aminos'}; " indices of strings whose lengths are... even multiples of 3: 2 5 non-even multiples of 3: 0 1 3 4 6 amino acids: 'AAABBBCCC'

    From the OP:

    ... Found x blocks (x = number of elemens in the array = $#array) ...

    $#array is the highest index of an element in the array. The number of elements would be given by  scalar(@array) or, in general, by evaluating  @array in scalar context, e.g.,  0+@array or  @array == 3 or some such.

Re: Analyse an array and join only some elements of it
by Cristoforo (Deacon) on Jun 16, 2012 at 15:54 UTC
    From your description, it sounds like you are parsing a 'fasta' format file line by line and, if it is evenly divisible by 3, join that to a string of sequences passing that test. I'm not sure, but if a single line of the sequence is divisible by 3, the entire sequence may not. The method below reads in an entire sequence at a time, and checks for mod 3 == 0, (evenly divisible by 3).

    #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; my $in = Bio::SeqIO->new( -file => "input1.txt" , -format => 'fasta'); my ($string, $valid, $invalid) = ('', 0, 0); while ( my $seq = $in->next_seq() ) { if ($seq->length % 3 == 0) { $string .= $seq->seq; ++$valid; } else { ++$invalid; } } open LOGFILE, ">", 'somefile' or die "Unable to open 'somefile' for wr +iting. $!"; print LOGFILE "Found ", $valid + $invalid, " blocks, joined $valid of +them and left out $invalid.\n"; close LOGFILE or die $!;


      Hi again! Thank you very much for all of your help and really good ideas. I will try all of them and see which works best in my programm. Christoforo, you're completely right: the input file is an annotated fasta file, which I have to read line by line and analyse for Codon Usage and so on. I didn't use Bioperl for creating a seqobject but instead did all the work by myself. :-) Your wellthought objection can be clearly answered, I think. If all parts are divisible by three the entire sequenz should be also divisible by three, because it consists of parts which are divisible by three. Thanks again, Markus

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://976425]
Approved by moritz
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2017-05-25 18:54 GMT
Find Nodes?
    Voting Booth?