Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

How do I remove spaces between sequences in a file

by bingalee (Acolyte)
on Jun 19, 2013 at 21:53 UTC ( #1039849=perlquestion: print w/ replies, xml ) Need Help??
bingalee has asked for the wisdom of the Perl Monks concerning the following question:

I thought this would be an easy task.Basically I have a file with sequences separated by white spaces and new lines.

Like this

GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG

when I used split and then joined it I got the entire sequence in one line..I dont want that.

The same thing happened when I did this

foreach $value (values(%seq)) { $value=~s/\n|\t|\r//g; }

Any idea how I can format my sequence file?

Comment on How do I remove spaces between sequences in a file
Select or Download Code
Replies are listed 'Best First'.
Re: How do I remove spaces between sequences in a file
by pvaldes (Chaplain) on Jun 19, 2013 at 21:56 UTC

    See $/ in perldoc perlvar. (And read this )

    $/:  The input record separator, newline by default. This influences Perl's idea of what a "line" is.
Re: How do I remove spaces between sequences in a file
by 2teez (Priest) on Jun 19, 2013 at 22:10 UTC

    I don't want this?
    So how do you want it?
    Like this?

    while(<DATA>){ chomp; print $_ unless /^\s*$/; } __DATA__ GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG
    Update: I think davido have what you wanted in Re^3: How do I remove spaces between sequences in a file

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me

      No- like this

      CCTGCCCGGGCGGAGGGAGGCGCGGCGCAACGGACTTCATGCTGCCACCCGCCGACCCG CACCTCACCGGCTCACCTACACCCTCCTCTTCGTTCGTCTCTTATCCATGCAATGCATC GTCTGACCTTGCCTTTTGTCTTTTACAAACACTGCCATCAGCAAATCATGCTATTTTTA TGCCGTCACGTTTCATGCAGCTATGTGTAAATAAATAAAACGTATATAACGCATTTTAA TCATAGATCCCCGACGCAACAACAATTGCAGCCACACCCCCGCGGAGCCATCACCTTTC ACTCTAATTAGAAACATCGACCTGCTCAACCCACCGAGGAATATAGACTGTTTTTTTTT CTATGTGGAAATACCAAGTAGTAGTGCCAAACGCTAAAAGGGTATGCATCTTAATTGAT GAGCTGTGAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATA AACGCAACGCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAA GAAAATTCGAAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAG TTCCAGGTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGA CCAGAGGCTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTC AAGAAAATTCGTGAAGCCAGGGGTTACTCTTACATGGACATTTGTGATGTGTGCCCAGA GAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTCTTTGAAGAACACTTGCATACTG ATGAAGAGATACGATATTGCCTTGAGGGTAGCGGCTACTTTGATGTGAGGGACCAAGAT GATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGGGCATGATCGTTTTGCCTGCAGGAAT GTATCATCGCTTTACGTTGGATAGTGACAACTACATCAAGGCAATGCGTCTATTTGTGG GTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCATCTCCCGGCTAGGAAGGAG TACGTCGAGAAAATCATCAACCGAGGTGGAAACCAAGCCGTCGAAGCTCGTTGAGCGTG TCCACTCTACATGTCCTCCTGCCGTCTCAGCCTCTTGTGTTTTACACCCTACAACTCCT AGTACCGCCGAATAAGATTTGCTATCTGCAATGTGCTCATGCCACCGCTGTGTGTGCCA GTTAACAGTTTGCACGAAACCCTAGATATTTTGTTATACGAATGAATGACATGTGGTGT TTGATAAATGATGAAACGATGATGGTTCAAATCAACCCACCTGTCTTTCACAAGTTCGA TGAAACGTTGCGCGCAATGGTTGTTTACTGAAAAGGATGCCTCTAACCAGCCGTCAGTC TGACTGGATGTCGTGGATCAACTGCCGACCAACATCTTGTCTGCGCTCGTGCCGCGTGA GTAACTGCTGTCGTTCTCACTGAGGGCGAGCAGGGCCTTGGTAATGATGTACTGTTTCA CAGTAACAGTTCTGTTGATCACCGAAAGGTTTGTGCAGGTTGATGTGATGTGACGATGC TGGGTGGCTCCTGCCTAGAAAACGTGTGGTGGTTGAAATGAACGCGTTCCATGTCTGTC GTCTAGTCACTCGTCGAGCTTTGGTCGGCGGAGCCGATCACCGGCCAAGAACATTTGTG ATCTCGATGATATACTTGTGTGCGGCTGCATTGCCGCACAAAAGGTTCATATTGCCATC ATCGTTCCTGCTCACAGATGGTAGTGGTGACGCACTGGCAAATAGACCTGATTTCATTG CGCAGCCAGATTCATCTATGGCTGAC

        Are you trying to say that you want to remove all spaces (including newlines) and then ensure that there's a newline after every 59th character?

        my $input = do { local $/ = undef; <DATA> }; $input =~ s/\s//mg; # Remove all whitespace. $input =~ s/(.{59})/$1\n/g; # Place a \n after every 59th character. print "$input\n"; __DATA__ GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG

        ...produces...

        GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAA CGCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATT CGAAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAGG TTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCGAACTAGG AATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCAAGAAAA TTCGTGAAGCCAGGGGTTACTCTTACATGGACATTTGTGATGTGTGCCCAGAGAAGTTG CCAAATTACGAGGCTAAGATAAAGAATTTCTTTGAAGAACACTTGCATACTGATGAAGA GATACGATATTGCCTTGAGGGTAGCGGCTACTTTGATGTGAGGGACCAAGATGATCAGT GGATCCGTGTAGCGGTAAAGAAAGGGGGCATGATCGTTTTGCCTGCAGGAATGTATCAT CGCTTTACGTTGGATAGTGACAACTACATCAAGGCAATGCGTCTATTTGTGGGTGAGCC TGTCTGGACACCATACAATCGTCCGCATGACCATCTCCCGGCTAG

        Dave

Re: How do I remove spaces between sequences in a file
by rnaeye (Pilgrim) on Jun 20, 2013 at 00:44 UTC

    How about something like this:

    $length determines the width of the column

    undef $/; my $sequence = <DATA>; $sequence =~ s/\s+//g; my $length = 60; for ( my $pos = 0 ; $pos < length($sequence) ; $pos += $length ) { print substr($sequence, $pos, $length), "\n"; } __DATA__ GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1039849]
Approved by Happy-the-monk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (16)
As of 2015-07-29 15:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (263 votes), past polls