Re: simple but stuck
by Fletch (Bishop) on Apr 25, 2002 at 14:32 UTC
|
my $base = ("A"|"C"|"G"|"T");
Erm, this is giving you the bitwise or of the four characters. This probably isn't a useful value. Of
course you never actually use it anyhow.
while (<INPUT>)
{
$sequence = $_
@array = ()
@array = split (/\n/, $sequence );
If you want to use another name for the input line, declare
it in the while itself (while( defined( my $sequence = <INPUT> ) )). Also it makes little sense to empty
@array right before you assign to it.
As for splitting on newlines, that's not really going to work since you haven't changed $/ and you'll only be getting back a line at a time. You probably
want to do $/ = "" to get paragraph mode; see
perldoc perlvar.
| [reply] [d/l] [select] |
Re: simple but stuck
by demerphq (Chancellor) on Apr 25, 2002 at 15:00 UTC
|
Hi, hopefully you are actually indenting your work and that it got unindented by not using code tags.
#! /usr/local/bin/perl -w
use strict;
open (INPUT, $ARGV[0]) or die "unable to open file";
This is precisely what the perl idiom while (<>){} is for
my $sequence;
my $count =0;
my $base = ("A"|"C"|"G"|"T");
This probably doesnt do what you want. $base ends up being the binary or of the string A C G and T. Which happens to be "W". :-)
my @array;
my $ideal = 250;
while (<INPUT>)
{
As i said before the while loop can be rewritten much cleaner. Also you are reading in a line at a time because you havent messed with $/
$sequence = $_;
No need to save $_ unless intend to do something that will smash it. Which afact you arent.
@array = ();
@array = split (/\n/, $sequence);
You read in a line, and then you split by \n. That wont help as there will only be 1 (or 0) but not more. So everything from there isnt going to go well. :-)
Maybe this does what you want, or least points you in the right direction?
while (<>) {
chomp; # lose potential newlines at the end. (only needed for portab
+ility)
print $_,"\n" if length $_ > 250; # print out any sequence longer t
+han 250chars
}
One of the cool things about the while (<>) idiom is that it will allow you to do this:
# find every line over 250 chars from three different files, print the
+m to STDOUT
over_250 file.1 file.2 file.3
# find lines over 250 chars from STDIN
over_250 <file.1
HTH
Yves / DeMerphq
---
Writing a good benchmark isnt as easy as it might look. | [reply] [d/l] [select] |
Re: simple but stuck
by perlplexer (Hermit) on Apr 25, 2002 at 14:30 UTC
|
Perhaps you could explain what a "base" is?
The code makes little sense...
my $base = ("A"|"C"|"G"|"T"); # what is this? $base is now 'W'
#...
foreach $base ($sequence){ # $sequence is a scalar, not an array
#...
}
--perlplexer | [reply] [d/l] |
Re: simple but stuck
by VSarkiss (Monsignor) on Apr 25, 2002 at 14:54 UTC
|
As others have pointed out, this:
my $base = ("A"|"C"|"G"|"T");doesn't do what you probably intended.
I'm not completely clear on what you're trying to do, but I think it's something like this: "if a line contains more than 250 bases, print it out" (where 250 can be changed easily). You don't need to count every single base: you can check that there's only A, T, C, or G in the line, and that it's longer than 250 characters. Something like this should do the trick (Note, this is untested):
while (<INPUT>)
{
print if /^[ATCG]+$/ && length >= $ideal;
}
If the number of bases you're looking for is fixed, you can put it in the regular expression:
print if /^[ATCG]{250,}$/;But the first example I gave is probably clearer anyway.
If that's not the problem you're trying to solve, please provide more information. HTH. | [reply] [d/l] [select] |
Re: simple but stuck
by stephen (Priest) on Apr 25, 2002 at 15:27 UTC
|
If I'm reading what you're trying to do correctly, you're trying to read sequences from a FastA format file, count the length of each sequence, and print out the ones that are more than 250 bases long.
If possible, I'd suggest using the bioperl module. That does all of the work of reading fasta format, and so will make your work easier. If you can't install modules yourself, perhaps you can get your system administrator to install it for you.
Here's an example of what I think you're trying to do, using the BioPerl modules.
#! /usr/local/bin/perl -w
### NOT TESTED CODE-- I'm not a bio guy, and don't have BioPerl...
use strict;
use constant IDEAL => 250;
use Bio::SeqIO;
MAIN: {
my $seq_file = Bio::SeqIO->new(
-file => $ARGV[0],
-format => 'fasta'
)
or die "Couldn't open $ARGV[0]: $!; stopped";
while ( my $seq = $seq_file->next_seq() ) {
if ( $seq->length() > IDEAL ) {
print $seq->seq(), "\n";
}
}
}
stephen
Update: Removed redundant length() call. | [reply] [d/l] |
Re: simple but stuck
by andye (Curate) on Apr 25, 2002 at 14:45 UTC
|
Hi lolly,
I'm guessing here, but it seems from your code like maybe what you're trying to do is return all lines which are longer than 250 characters? If so, try something like this:
while (<INPUT>) {
chomp;
print if length($_) > 250;
}
If that's not what you want, then could you explain again what the problem is? (particularly, what's a base?)
Anyway, hope that helps.
andy.
| [reply] [d/l] |
|
| [reply] [d/l] |
|
RMGir, you have a good point.
while (<INPUT>) {
print if length($_) > 251;
}
lolly, note that the number has changed to 251 (250 for the number of bases, plus one for the newline).
Andy.
Update: Or of course:
while (<INPUT>) {
chomp;
print $_."\n" if length($_) > 250;
}
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
|
sorry, DNA is composed of four bases A, C, T, G. i will try what you have suggested.
lolly
| [reply] |
|
Can the number on the line before the string of bases ever be longer than 250 digits? If yes, then you need something more sophisticated than my suggestion.
Come back to us if it doesn't work.
Andy.
| [reply] |
Re: simple but stuck
by Putzfrau (Beadle) on Apr 25, 2002 at 15:09 UTC
|
By base i assume you mean each occurence of the four letters representing
A is for adenine,
G is for guanine,
C is for cytosine,
T is for thymine,
and so you would want to count each of those occurences?
so you could
while(<input>)
{
$sequence = $_;
chomp($sequence);
$count=length($sequence);
if ($count >= $ideal)
{
print "$sequence\n"; # or whatever you want here
}
}
Hope this is what you had in mind and that it helps
"when a new client is created, we have to kill all the children..." --Sams Teach yourself Perl 5 | [reply] [d/l] |
Re: simple but stuck
by YuckFoo (Abbot) on Apr 25, 2002 at 16:29 UTC
|
The translate operator with the same search and replace
list will return a count of the number of matched
characters. Read all about it in the perlop
manpage.
while ($line = <INPUT>) {
if ($line =~ tr/ACGT/ACGT/ > 250) { print $line; }
}
| [reply] [d/l] |