Hi there. I've written a script that is trying to remove a 20 character sequence from a column with a varying offset.
My script does what I want it to do, expect that when I print @spliceout, it contains whitespace between letters. I've tried
for (@spliceout) {
s/\s+$//;
}
but this doesn't work. I think I have confused it splitting a column into individual elements. I'm not too sure.
My script is:
#!/usr/bin/perl -w
use strict;
my $inputfile1 = $ARGV[0];
open (FILE1, $inputfile1) or die "Uh oh.. unable to find file $inputfi
+le1"; ##Opens input file
my @file1 = <FILE1>; #loads inputfile1 data into array
close FILE1;
my @matches;
foreach my $file1 (@file1) {
if($file1 =~ m/splic/) {
push (@matches, $file1); ##loads matches into array @matches
}
}
my @col1; ## column 1
my @col_ID; ## column 2
my @col3; ## column 3
my @col_strand_direction; ## column 6
foreach my $match(@matches) { ## process each line, splitting columns
+and move onto next line
my @colsplit = split("\t", $match);
push (@col3, $colsplit[2] . "\n"); ##pushes third column to @col3
+array
push (@col1, $colsplit[0] . "\n");
push (@col_ID, $colsplit[1] . "\n");
push (@col_strand_direction, $colsplit[5] . "\n");
}
my @intron_from_boundary;
my @baseref;
foreach my $col3line(@col3) {
if ($col3line =~ m/([\+|\-]\d+)\w+(\[[ACTG]])/) { ##pulls out
++ or - and subsequent number and [base change]
push (@intron_from_boundary, $1 . "\n"); ##$1 pushes what is i
+n the first set of brackets
push (@baseref, $2 . "\n");
}
}
## need to take each intronmatch value and work out its position relat
+ive to intron/exon boundary
my $left_of_boundary;
my $intron_from_boundary;
my $new_left;
my @spliceout;
## split seq of @col1 into array
my $i = 0;
foreach my $col1(@col1) {
my @col1split = split(//, $col1);
##for -7:
$left_of_boundary = 10; ##10 to the left
if ($col_strand_direction[$i] =~ m/\+/) {
$left_of_boundary = $left_of_boundary + $intron_from_boundary[
+$i]; ##3 to the left
$new_left = 23 - $left_of_boundary; ## 20
}
else {
$left_of_boundary = $left_of_boundary - $intron_from_boundary[
+$i]; ##3 to the left
$new_left = 23 - $left_of_boundary; ## 20
}
my @spliceout = splice @col1split, $new_left, 22; ##want to pu
+ll out 3 letters to left of [G] and 16 to the right }
print "@spliceout\n";
open (MYFILE, '>>fasta');
print MYFILE (">" . "$col_ID[$i]" , "@spliceout" , "\n");
+
close (MYFILE);
++$i;
}
Any help would be greatly appreciated, and yes, my scripting is rather messy, I'm still learning! Many thanks :)
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.