Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

It's all getting messy - remove whitespace

by lecb (Acolyte)
on Jun 15, 2014 at 09:49 UTC ( #1089929=perlquestion: print w/ replies, xml ) Need Help??
lecb has asked for the wisdom of the Perl Monks concerning the following question:

Hi there. I've written a script that is trying to remove a 20 character sequence from a column with a varying offset.

My script does what I want it to do, expect that when I print @spliceout, it contains whitespace between letters. I've tried

for (@spliceout) { s/\s+$//; }

but this doesn't work. I think I have confused it splitting a column into individual elements. I'm not too sure.

My script is:

#!/usr/bin/perl -w use strict; my $inputfile1 = $ARGV[0]; open (FILE1, $inputfile1) or die "Uh oh.. unable to find file $inputfi +le1"; ##Opens input file my @file1 = <FILE1>; #loads inputfile1 data into array close FILE1; my @matches; foreach my $file1 (@file1) { if($file1 =~ m/splic/) { push (@matches, $file1); ##loads matches into array @matches } } my @col1; ## column 1 my @col_ID; ## column 2 my @col3; ## column 3 my @col_strand_direction; ## column 6 foreach my $match(@matches) { ## process each line, splitting columns +and move onto next line my @colsplit = split("\t", $match); push (@col3, $colsplit[2] . "\n"); ##pushes third column to @col3 +array push (@col1, $colsplit[0] . "\n"); push (@col_ID, $colsplit[1] . "\n"); push (@col_strand_direction, $colsplit[5] . "\n"); } my @intron_from_boundary; my @baseref; foreach my $col3line(@col3) { if ($col3line =~ m/([\+|\-]\d+)\w+(\[[ACTG]])/) { ##pulls out ++ or - and subsequent number and [base change] push (@intron_from_boundary, $1 . "\n"); ##$1 pushes what is i +n the first set of brackets push (@baseref, $2 . "\n"); } } ## need to take each intronmatch value and work out its position relat +ive to intron/exon boundary my $left_of_boundary; my $intron_from_boundary; my $new_left; my @spliceout; ## split seq of @col1 into array my $i = 0; foreach my $col1(@col1) { my @col1split = split(//, $col1); ##for -7: $left_of_boundary = 10; ##10 to the left if ($col_strand_direction[$i] =~ m/\+/) { $left_of_boundary = $left_of_boundary + $intron_from_boundary[ +$i]; ##3 to the left $new_left = 23 - $left_of_boundary; ## 20 } else { $left_of_boundary = $left_of_boundary - $intron_from_boundary[ +$i]; ##3 to the left $new_left = 23 - $left_of_boundary; ## 20 } my @spliceout = splice @col1split, $new_left, 22; ##want to pu +ll out 3 letters to left of [G] and 16 to the right } print "@spliceout\n"; open (MYFILE, '>>fasta'); print MYFILE (">" . "$col_ID[$i]" , "@spliceout" , "\n"); + close (MYFILE); ++$i; }

Any help would be greatly appreciated, and yes, my scripting is rather messy, I'm still learning! Many thanks :)

Comment on It's all getting messy - remove whitespace
Select or Download Code
Replies are listed 'Best First'.
Re: It's all getting messy - remove whitespace
by tinita (Parson) on Jun 15, 2014 at 10:50 UTC
    Whenever you don't know what's in your data (you think it contains whitespaces) print it to make sure.
    use Data::Dumper; local $Data::Dumper::Useqq = 1; # make "invisible" characters visible print Dumper \@spliceout;

    Then read about array interpolation in perldata:
    Arrays and slices are interpolated into double-quoted strings by joining the elements with the delimiter specified in the $" variable

    The problem is in this line:
      print MYFILE (">" . "$col_ID[$i]" , "@spliceout" , "\n");    

      Thank you! I have modified the script and it now works!

      my @splicejoin = join('', @spliceout); open (MYFILE, '>>fasta'); print MYFILE (">" . "$col_ID[$i]" , "@splicejoin" . "\n"); + close (MYFILE);

      Many thanks for the quick responses and for pointing me in the direction about it not actually being whitespace; didn't realise I could do that. Thanks again.

Re: It's all getting messy - remove whitespace
by RichardK (Vicar) on Jun 15, 2014 at 11:12 UTC

    Yes perl does that. There's a difference between

    print "@array\n"

    and

    print @array,"\n";

    Try it and see ;)

    As you're trying to extract part of a string, you might find it easier to use substr, rather than convert to an array and back.

    so something like

    my $seq = substr( $column, $start, $length);

      RichardK, you hero! I have use a substring and it works like a dream! You've made a young lady very, very happy.

Re: It's all getting messy - remove whitespace
by FloydATC (Chaplain) on Jun 15, 2014 at 10:45 UTC

    The regular expression \s+$ means "one or more whitespace characters at the end of the line" so it will never match whitespaces in between other characters.

    Whenever I want to strip excessive whitespace from a string, I usually go about it in a three-step way:
    1. Combine multiple whitespaces into a single space using s/\s+/ /g;
    2. Remove the leading space (if any) with s/^\s//;
    3. Remove the trailing space (if any) with s/\s$//; (Remember that this assumes no CR/LF character, adjust as needed)

    Add a few comments and the job is done.

    Perhaps this can be combined into a single-shot regex but it will probably be harder to read and waste more CPU cycles to accomplish exactly the same thing. I never bothered to find out because this approach works for me.

    -- FloydATC

    Time flies when you don't know what you're doing

Re: It's all getting messy - remove whitespace
by poj (Priest) on Jun 15, 2014 at 14:16 UTC

    Instead of creating separate arrays like this

    my @col1; ## column 1 my @col_ID; ## column 2 my @col3; ## column 3
    you could use a single array of hashes. ( See perldsc )

    @AoH = ( { col1 => "col1", col_ID => "col_ID", col3 => "col3", },)
    For example, something like this ; poj

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1089929]
Approved by vinoth.ree
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (11)
As of 2015-07-07 23:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls