Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

How do I remove spaces between sequences in a file

by bingalee (Acolyte)
on Jun 19, 2013 at 21:53 UTC ( #1039849=perlquestion: print w/ replies, xml ) Need Help??
bingalee has asked for the wisdom of the Perl Monks concerning the following question:

I thought this would be an easy task.Basically I have a file with sequences separated by white spaces and new lines.

Like this

GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG

when I used split and then joined it I got the entire sequence in one line..I dont want that.

The same thing happened when I did this

foreach $value (values(%seq)) { $value=~s/\n|\t|\r//g; }

Any idea how I can format my sequence file?

Comment on How do I remove spaces between sequences in a file
Select or Download Code
Re: How do I remove spaces between sequences in a file
by pvaldes (Chaplain) on Jun 19, 2013 at 21:56 UTC

    See $/ in perldoc perlvar. (And read this )

    $/:  The input record separator, newline by default. This influences Perl's idea of what a "line" is.
Re: How do I remove spaces between sequences in a file
by 2teez (Priest) on Jun 19, 2013 at 22:10 UTC

    I don't want this?
    So how do you want it?
    Like this?

    while(<DATA>){ chomp; print $_ unless /^\s*$/; } __DATA__ GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG
    Update: I think davido have what you wanted in Re^3: How do I remove spaces between sequences in a file

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me

      No- like this

      CCTGCCCGGGCGGAGGGAGGCGCGGCGCAACGGACTTCATGCTGCCACCCGCCGACCCG CACCTCACCGGCTCACCTACACCCTCCTCTTCGTTCGTCTCTTATCCATGCAATGCATC GTCTGACCTTGCCTTTTGTCTTTTACAAACACTGCCATCAGCAAATCATGCTATTTTTA TGCCGTCACGTTTCATGCAGCTATGTGTAAATAAATAAAACGTATATAACGCATTTTAA TCATAGATCCCCGACGCAACAACAATTGCAGCCACACCCCCGCGGAGCCATCACCTTTC ACTCTAATTAGAAACATCGACCTGCTCAACCCACCGAGGAATATAGACTGTTTTTTTTT CTATGTGGAAATACCAAGTAGTAGTGCCAAACGCTAAAAGGGTATGCATCTTAATTGAT GAGCTGTGAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATA AACGCAACGCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAA GAAAATTCGAAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAG TTCCAGGTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGA CCAGAGGCTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTC AAGAAAATTCGTGAAGCCAGGGGTTACTCTTACATGGACATTTGTGATGTGTGCCCAGA GAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTCTTTGAAGAACACTTGCATACTG ATGAAGAGATACGATATTGCCTTGAGGGTAGCGGCTACTTTGATGTGAGGGACCAAGAT GATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGGGCATGATCGTTTTGCCTGCAGGAAT GTATCATCGCTTTACGTTGGATAGTGACAACTACATCAAGGCAATGCGTCTATTTGTGG GTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCATCTCCCGGCTAGGAAGGAG TACGTCGAGAAAATCATCAACCGAGGTGGAAACCAAGCCGTCGAAGCTCGTTGAGCGTG TCCACTCTACATGTCCTCCTGCCGTCTCAGCCTCTTGTGTTTTACACCCTACAACTCCT AGTACCGCCGAATAAGATTTGCTATCTGCAATGTGCTCATGCCACCGCTGTGTGTGCCA GTTAACAGTTTGCACGAAACCCTAGATATTTTGTTATACGAATGAATGACATGTGGTGT TTGATAAATGATGAAACGATGATGGTTCAAATCAACCCACCTGTCTTTCACAAGTTCGA TGAAACGTTGCGCGCAATGGTTGTTTACTGAAAAGGATGCCTCTAACCAGCCGTCAGTC TGACTGGATGTCGTGGATCAACTGCCGACCAACATCTTGTCTGCGCTCGTGCCGCGTGA GTAACTGCTGTCGTTCTCACTGAGGGCGAGCAGGGCCTTGGTAATGATGTACTGTTTCA CAGTAACAGTTCTGTTGATCACCGAAAGGTTTGTGCAGGTTGATGTGATGTGACGATGC TGGGTGGCTCCTGCCTAGAAAACGTGTGGTGGTTGAAATGAACGCGTTCCATGTCTGTC GTCTAGTCACTCGTCGAGCTTTGGTCGGCGGAGCCGATCACCGGCCAAGAACATTTGTG ATCTCGATGATATACTTGTGTGCGGCTGCATTGCCGCACAAAAGGTTCATATTGCCATC ATCGTTCCTGCTCACAGATGGTAGTGGTGACGCACTGGCAAATAGACCTGATTTCATTG CGCAGCCAGATTCATCTATGGCTGAC

        Are you trying to say that you want to remove all spaces (including newlines) and then ensure that there's a newline after every 59th character?

        my $input = do { local $/ = undef; <DATA> }; $input =~ s/\s//mg; # Remove all whitespace. $input =~ s/(.{59})/$1\n/g; # Place a \n after every 59th character. print "$input\n"; __DATA__ GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG

        ...produces...

        GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAA CGCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATT CGAAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAGG TTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCGAACTAGG AATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCAAGAAAA TTCGTGAAGCCAGGGGTTACTCTTACATGGACATTTGTGATGTGTGCCCAGAGAAGTTG CCAAATTACGAGGCTAAGATAAAGAATTTCTTTGAAGAACACTTGCATACTGATGAAGA GATACGATATTGCCTTGAGGGTAGCGGCTACTTTGATGTGAGGGACCAAGATGATCAGT GGATCCGTGTAGCGGTAAAGAAAGGGGGCATGATCGTTTTGCCTGCAGGAATGTATCAT CGCTTTACGTTGGATAGTGACAACTACATCAAGGCAATGCGTCTATTTGTGGGTGAGCC TGTCTGGACACCATACAATCGTCCGCATGACCATCTCCCGGCTAG

        Dave

Re: How do I remove spaces between sequences in a file
by rnaeye (Pilgrim) on Jun 20, 2013 at 00:44 UTC

    How about something like this:

    $length determines the width of the column

    undef $/; my $sequence = <DATA>; $sequence =~ s/\s+//g; my $length = 60; for ( my $pos = 0 ; $pos < length($sequence) ; $pos += $length ) { print substr($sequence, $pos, $length), "\n"; } __DATA__ GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1039849]
Approved by Happy-the-monk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2014-11-21 02:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (104 votes), past polls