Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

extra spaces before \r\n

by agustina_s (Sexton)
on Mar 06, 2002 at 02:38 UTC ( #149593=perlquestion: print w/ replies, xml ) Need Help??
agustina_s has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks.

How to get rid of extra spaces before the input separator \r\n ?
I have a flat file input separated by \r\n.,
Sample input :
FEATURE /group=" " /translation="MMSKLGVLLT ICLLLFPLTA VQLDGDQPAD LPALRT +QDIA TDHSPWFDPV KRCCSRYCYI CIPCCPN" /disulfide="9,19" /disulfide="13,24" /hydroxylation="10" /hydroxylation="11" /post_trans_mod="
For a few input lines, there are extra spaces before the \r\n, for eg :
Input: /disulfide="13,24"\r\n /hydroxylation="10" \r\n /hydroxylation="11"\r\n Output: Disulfide "13,24" Hydroxylation "10" Hydroxylation "11" Desired output: Disulfide "13,24" Hydroxylation "10" Hydroxylation "11"
This cause trouble in my program even though I have check for extra spaces in my regex. This is part of my working program :
#!/usr/bin/perl -w use strict; #initialize all the variable, initialize flags to 0 and line to '' my $last; my $file1="$ARGV[0]"; my $result=">".$ARGV[1]; my $flag=0; my $templine=''; open(INFO1,$file1) or die "Can't open $file1.\n"; open(OUT,$result) or die "Can't open $result.\n"; foreach(<INFO1>) { if(/\s*\t*\s*\/(.*)=(.*)\s*\r/){ my $feature=$1; my $value=$2; if(($value !~ /" "/) && ($value !~ /""/)){ print OUT ucfirst($feature),"\t$value"; $last=chop($value); if($last eq "\""){ #if last element=" print OUT "\n"; $flag=0; } else { $flag=1; } } } elsif(/\s*(.*)\s*\r/ && $flag==1){ $templine=$1; print OUT " $1"; $last=chop($templine); if($last eq '"'){ print OUT "\n"; $flag=0; } } } close(INFO1) or die "Can't close $file1.\n"; close(OUT) or die "Can't open $result.\n";
I use flag in case the item consists of more than one line. The entry is always quoted, so I use the " to check whether it is already the end of the item.
What should I do since it's not possible for me to check the input files and remove the extra spaces.

Thank you in advanced.
Regards

Agustina

Comment on extra spaces before \r\n
Select or Download Code
Re: extra spaces before \r\n
by ajwans (Scribe) on Mar 06, 2002 at 03:03 UTC
    You need to change the greediness of the regex operators
    /\s*\t*\s*\/(.*)=(.*)\s*\r/
    should become
    /\s*\t*\s*\/(.*)=(.*?)\s*\r/
    You can read all about it in the perlre perldoc.

    1. dude, what does mine say?
    2. "sweet", what about mine?
    3. "dude", what does mine say?
    4. GOTO 2
      thanks a lot...
Re: extra spaces before \r\n
by trs80 (Priest) on Mar 06, 2002 at 04:15 UTC
    I am not sure what all you want to keep, but your regex seems a little extreme for the example you show. Here is how I would solve it. I give it a little bit of formatting that I hope doesn't confuse the issue.
    while (<DATA>) { # the regex remembers for one or more # alphanumeric characters \w+ # followed by an equal sign and double quote # then remembers one or more digit or comma [\d\,]+ # follwed by a double quote. if (/(\w+)\=\"([\d\,]+)\"/) { # The print statement adds an equal amount # of padding between the first entry and the # second by subtracting the length of $1 by # twenty and then adding that many spaces # to the output. # replace ', " " x (20 - length $1)' # with ', "\t"' if you want a tab instead print ucfirst($1) , " " x (20 - length $1) , $2 , "\n"; } } __DATA__ FEATURE /group=" " /translation="MMSKLGVLLT ICLLLFPLTA VQLDGDQPAD +LPALRTQDIA TDHSPWFDPV KRCCSRYCYI CIPCCPN" /disulfide="9,19" /disulfide="13,24" /hydroxylation="10" /hydroxylation="11" /post_trans_mod="
    update: Added code comments
Re: extra spaces before \r\n
by Juerd (Abbot) on Mar 06, 2002 at 12:15 UTC
Re: extra spaces before \r\n
by robsv (Curate) on Mar 06, 2002 at 17:14 UTC
    Agustina,
    Looks like you're parsing GenBank records. Have you looked into Bioperl? It has methods to read and parse files in GenBank format, and the feature data is accessible using the SeqFeature object. You may still need to do some trimming/cleanup here and there, but it might simplify your parsing.

    - robsv

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://149593]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (13)
As of 2014-09-16 17:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (37 votes), past polls