Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

How do I remove spaces between sequences in a file

by bingalee (Acolyte)
on Jun 19, 2013 at 21:53 UTC ( #1039849=perlquestion: print w/replies, xml ) Need Help??
bingalee has asked for the wisdom of the Perl Monks concerning the following question:

I thought this would be an easy task.Basically I have a file with sequences separated by white spaces and new lines.

Like this

GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG

when I used split and then joined it I got the entire sequence in one line..I dont want that.

The same thing happened when I did this

foreach $value (values(%seq)) { $value=~s/\n|\t|\r//g; }

Any idea how I can format my sequence file?

Replies are listed 'Best First'.
Re: How do I remove spaces between sequences in a file
by pvaldes (Chaplain) on Jun 19, 2013 at 21:56 UTC

  See $/ in perldoc perlvar. (And read this )

  $/:  The input record separator, newline by default. This influences Perl's idea of what a "line" is.
Re: How do I remove spaces between sequences in a file
by 2teez (Vicar) on Jun 19, 2013 at 22:10 UTC

  I don't want this?
  So how do you want it?
  Like this?

  while(<DATA>){ chomp; print $_ unless /^\s*$/; } __DATA__ GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG
  Update: I think davido have what you wanted in Re^3: How do I remove spaces between sequences in a file

  If you tell me, I'll forget.
  If you show me, I'll remember.
  if you involve me, I'll understand.
  --- Author unknown to me

   No- like this

   CCTGCCCGGGCGGAGGGAGGCGCGGCGCAACGGACTTCATGCTGCCACCCGCCGACCCG CACCTCACCGGCTCACCTACACCCTCCTCTTCGTTCGTCTCTTATCCATGCAATGCATC GTCTGACCTTGCCTTTTGTCTTTTACAAACACTGCCATCAGCAAATCATGCTATTTTTA TGCCGTCACGTTTCATGCAGCTATGTGTAAATAAATAAAACGTATATAACGCATTTTAA TCATAGATCCCCGACGCAACAACAATTGCAGCCACACCCCCGCGGAGCCATCACCTTTC ACTCTAATTAGAAACATCGACCTGCTCAACCCACCGAGGAATATAGACTGTTTTTTTTT CTATGTGGAAATACCAAGTAGTAGTGCCAAACGCTAAAAGGGTATGCATCTTAATTGAT GAGCTGTGAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATA AACGCAACGCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAA GAAAATTCGAAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAG TTCCAGGTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGA CCAGAGGCTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTC AAGAAAATTCGTGAAGCCAGGGGTTACTCTTACATGGACATTTGTGATGTGTGCCCAGA GAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTCTTTGAAGAACACTTGCATACTG ATGAAGAGATACGATATTGCCTTGAGGGTAGCGGCTACTTTGATGTGAGGGACCAAGAT GATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGGGCATGATCGTTTTGCCTGCAGGAAT GTATCATCGCTTTACGTTGGATAGTGACAACTACATCAAGGCAATGCGTCTATTTGTGG GTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCATCTCCCGGCTAGGAAGGAG TACGTCGAGAAAATCATCAACCGAGGTGGAAACCAAGCCGTCGAAGCTCGTTGAGCGTG TCCACTCTACATGTCCTCCTGCCGTCTCAGCCTCTTGTGTTTTACACCCTACAACTCCT AGTACCGCCGAATAAGATTTGCTATCTGCAATGTGCTCATGCCACCGCTGTGTGTGCCA GTTAACAGTTTGCACGAAACCCTAGATATTTTGTTATACGAATGAATGACATGTGGTGT TTGATAAATGATGAAACGATGATGGTTCAAATCAACCCACCTGTCTTTCACAAGTTCGA TGAAACGTTGCGCGCAATGGTTGTTTACTGAAAAGGATGCCTCTAACCAGCCGTCAGTC TGACTGGATGTCGTGGATCAACTGCCGACCAACATCTTGTCTGCGCTCGTGCCGCGTGA GTAACTGCTGTCGTTCTCACTGAGGGCGAGCAGGGCCTTGGTAATGATGTACTGTTTCA CAGTAACAGTTCTGTTGATCACCGAAAGGTTTGTGCAGGTTGATGTGATGTGACGATGC TGGGTGGCTCCTGCCTAGAAAACGTGTGGTGGTTGAAATGAACGCGTTCCATGTCTGTC GTCTAGTCACTCGTCGAGCTTTGGTCGGCGGAGCCGATCACCGGCCAAGAACATTTGTG ATCTCGATGATATACTTGTGTGCGGCTGCATTGCCGCACAAAAGGTTCATATTGCCATC ATCGTTCCTGCTCACAGATGGTAGTGGTGACGCACTGGCAAATAGACCTGATTTCATTG CGCAGCCAGATTCATCTATGGCTGAC

    Are you trying to say that you want to remove all spaces (including newlines) and then ensure that there's a newline after every 59th character?

    my $input = do { local $/ = undef; <DATA> }; $input =~ s/\s//mg; # Remove all whitespace. $input =~ s/(.{59})/$1\n/g; # Place a \n after every 59th character. print "$input\n"; __DATA__ GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG

    ...produces...

    GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAA CGCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATT CGAAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAGG TTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCGAACTAGG AATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCAAGAAAA TTCGTGAAGCCAGGGGTTACTCTTACATGGACATTTGTGATGTGTGCCCAGAGAAGTTG CCAAATTACGAGGCTAAGATAAAGAATTTCTTTGAAGAACACTTGCATACTGATGAAGA GATACGATATTGCCTTGAGGGTAGCGGCTACTTTGATGTGAGGGACCAAGATGATCAGT GGATCCGTGTAGCGGTAAAGAAAGGGGGCATGATCGTTTTGCCTGCAGGAATGTATCAT CGCTTTACGTTGGATAGTGACAACTACATCAAGGCAATGCGTCTATTTGTGGGTGAGCC TGTCTGGACACCATACAATCGTCCGCATGACCATCTCCCGGCTAG

    Dave

Re: How do I remove spaces between sequences in a file
by rnaeye (Pilgrim) on Jun 20, 2013 at 00:44 UTC

  How about something like this:

  $length determines the width of the column

  undef $/; my $sequence = <DATA>; $sequence =~ s/\s+//g; my $length = 60; for ( my $pos = 0 ; $pos < length($sequence) ; $pos += $length ) { print substr($sequence, $pos, $length), "\n"; } __DATA__ GAGGGTGCAATCTACAAGAGTGCGGGTTCGCGTTCAGTCCAAGCTGAACATAAACGCAAC GCAAGGCCCCTCGACGCAGATACCGCGCTCAGGAGCATTCGAGAAGCCGAAGAAAATTCG AAGACGAGGACCAGCTACACTGCGGACGGACTTCGGGCATGGAGGGCCAGTTCCAG GTTGGCAAGGAGGAGGTCATCCAAGCATGGTACATGGACGACAGTGAAGAGGACCAGAGG CTTCCTCATCATCGTGAGCCCAAAGAATTCATTCCTCTCGACAAACTTTCCG AACTAGGAATATTAAGCTGGCGCCTAAATGCTGATGATTGGGAGAATGATGAGAACCTCA AGAAAATTCGTGAAGCCAGGGGTTACTCTTACATG GACATTTGTGATGTGTGCCCAGAGAAGTTGCCAAATTACGAGGCTAAGATAAAGAATTTC TTTGAAGAACACTTGCATACTGATGAAGAGATACGATATTGCCTTGAGGGTAGCG GCTACTTTGATGTGAGGGACCAAGATGATCAGTGGATCCGTGTAGCGGTAAAGAAAGGGG GCATGATCGTTTTGCCTGCAGGAATGTATCATCGCTTTACGTTGGATAGTGACAACTACA TCAAG GCAATGCGTCTATTTGTGGGTGAGCCTGTCTGGACACCATACAATCGTCCGCATGACCAT CTCCCGGCTAG

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1039849]
Approved by Happy-the-monk
help
Chatterbox?
and not a whimper to be heard...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2018-05-22 03:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
  Voting Booth?
  Notices?