Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

It's all getting messy - remove whitespace

by lecb (Acolyte)
on Jun 15, 2014 at 09:49 UTC ( #1089929=perlquestion: print w/ replies, xml ) Need Help??
lecb has asked for the wisdom of the Perl Monks concerning the following question:

Hi there. I've written a script that is trying to remove a 20 character sequence from a column with a varying offset.

My script does what I want it to do, expect that when I print @spliceout, it contains whitespace between letters. I've tried

for (@spliceout) { s/\s+$//; }

but this doesn't work. I think I have confused it splitting a column into individual elements. I'm not too sure.

My script is:

#!/usr/bin/perl -w use strict; my $inputfile1 = $ARGV[0]; open (FILE1, $inputfile1) or die "Uh oh.. unable to find file $inputfi +le1"; ##Opens input file my @file1 = <FILE1>; #loads inputfile1 data into array close FILE1; my @matches; foreach my $file1 (@file1) { if($file1 =~ m/splic/) { push (@matches, $file1); ##loads matches into array @matches } } my @col1; ## column 1 my @col_ID; ## column 2 my @col3; ## column 3 my @col_strand_direction; ## column 6 foreach my $match(@matches) { ## process each line, splitting columns +and move onto next line my @colsplit = split("\t", $match); push (@col3, $colsplit[2] . "\n"); ##pushes third column to @col3 +array push (@col1, $colsplit[0] . "\n"); push (@col_ID, $colsplit[1] . "\n"); push (@col_strand_direction, $colsplit[5] . "\n"); } my @intron_from_boundary; my @baseref; foreach my $col3line(@col3) { if ($col3line =~ m/([\+|\-]\d+)\w+(\[[ACTG]])/) { ##pulls out ++ or - and subsequent number and [base change] push (@intron_from_boundary, $1 . "\n"); ##$1 pushes what is i +n the first set of brackets push (@baseref, $2 . "\n"); } } ## need to take each intronmatch value and work out its position relat +ive to intron/exon boundary my $left_of_boundary; my $intron_from_boundary; my $new_left; my @spliceout; ## split seq of @col1 into array my $i = 0; foreach my $col1(@col1) { my @col1split = split(//, $col1); ##for -7: $left_of_boundary = 10; ##10 to the left if ($col_strand_direction[$i] =~ m/\+/) { $left_of_boundary = $left_of_boundary + $intron_from_boundary[ +$i]; ##3 to the left $new_left = 23 - $left_of_boundary; ## 20 } else { $left_of_boundary = $left_of_boundary - $intron_from_boundary[ +$i]; ##3 to the left $new_left = 23 - $left_of_boundary; ## 20 } my @spliceout = splice @col1split, $new_left, 22; ##want to pu +ll out 3 letters to left of [G] and 16 to the right } print "@spliceout\n"; open (MYFILE, '>>fasta'); print MYFILE (">" . "$col_ID[$i]" , "@spliceout" , "\n"); + close (MYFILE); ++$i; }

Any help would be greatly appreciated, and yes, my scripting is rather messy, I'm still learning! Many thanks :)

Comment on It's all getting messy - remove whitespace
Select or Download Code
Re: It's all getting messy - remove whitespace
by FloydATC (Chaplain) on Jun 15, 2014 at 10:45 UTC

    The regular expression \s+$ means "one or more whitespace characters at the end of the line" so it will never match whitespaces in between other characters.

    Whenever I want to strip excessive whitespace from a string, I usually go about it in a three-step way:
    1. Combine multiple whitespaces into a single space using s/\s+/ /g;
    2. Remove the leading space (if any) with s/^\s//;
    3. Remove the trailing space (if any) with s/\s$//; (Remember that this assumes no CR/LF character, adjust as needed)

    Add a few comments and the job is done.

    Perhaps this can be combined into a single-shot regex but it will probably be harder to read and waste more CPU cycles to accomplish exactly the same thing. I never bothered to find out because this approach works for me.

    -- FloydATC

    Time flies when you don't know what you're doing

Re: It's all getting messy - remove whitespace
by tinita (Parson) on Jun 15, 2014 at 10:50 UTC
    Whenever you don't know what's in your data (you think it contains whitespaces) print it to make sure.
    use Data::Dumper; local $Data::Dumper::Useqq = 1; # make "invisible" characters visible print Dumper \@spliceout;

    Then read about array interpolation in perldata:
    Arrays and slices are interpolated into double-quoted strings by joining the elements with the delimiter specified in the $" variable

    The problem is in this line:
      print MYFILE (">" . "$col_ID[$i]" , "@spliceout" , "\n");    

      Thank you! I have modified the script and it now works!

      my @splicejoin = join('', @spliceout); open (MYFILE, '>>fasta'); print MYFILE (">" . "$col_ID[$i]" , "@splicejoin" . "\n"); + close (MYFILE);

      Many thanks for the quick responses and for pointing me in the direction about it not actually being whitespace; didn't realise I could do that. Thanks again.

Re: It's all getting messy - remove whitespace
by RichardK (Priest) on Jun 15, 2014 at 11:12 UTC

    Yes perl does that. There's a difference between

    print "@array\n"

    and

    print @array,"\n";

    Try it and see ;)

    As you're trying to extract part of a string, you might find it easier to use substr, rather than convert to an array and back.

    so something like

    my $seq = substr( $column, $start, $length);

      RichardK, you hero! I have use a substring and it works like a dream! You've made a young lady very, very happy.

Re: It's all getting messy - remove whitespace
by poj (Priest) on Jun 15, 2014 at 14:16 UTC

    Instead of creating separate arrays like this

    my @col1; ## column 1 my @col_ID; ## column 2 my @col3; ## column 3
    you could use a single array of hashes. ( See perldsc )

    @AoH = ( { col1 => "col1", col_ID => "col_ID", col3 => "col3", },)
    For example, something like this ; poj

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1089929]
Approved by vinoth.ree
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2014-12-28 18:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (182 votes), past polls