in reply to Base sequence length in fasta format file

Hi lolly,

I'm guessing here, but it seems from your code like maybe what you're trying to do is return all lines which are longer than 250 characters? If so, try something like this:

while (<INPUT>) { chomp; print if length($_) > 250; }
If that's not what you want, then could you explain again what the problem is? (particularly, what's a base?)

Anyway, hope that helps.

andy.

Replies are listed 'Best First'.
Skip chomp?
by RMGir (Prior) on Apr 25, 2002 at 14:50 UTC
    s/chomp;//;

    You probably don't want to chomp, or all the printed bases will come out on the same line...
    --
    Mike

      RMGir, you have a good point.
      while (<INPUT>) { print if length($_) > 251; }

      lolly, note that the number has changed to 251 (250 for the number of bases, plus one for the newline).

      Andy.

      Update: Or of course:

      while (<INPUT>) { chomp; print $_."\n" if length($_) > 250; }
        Lets hope she doesnt have to run this code on a MS machine. Or one of the other machines where $/ is longer than 1 char. Or have it run on multiple systems simultaneously.

        No the safest way is to use chomp and remove the char. Then determine the length, then put the newline back on for printing.

        :-)

        Yves / DeMerphq
        ---
        Is $/ the same length on every system?

      Actually if portability is a concern than the chomp is vital, but no matter what hes going to have to factor the length of the /n into his length calculation.

      You are correct however in that he needs to put the newline back on once its been removed.

      Yves / DeMerphq
      ---
      Writing a good benchmark isnt as easy as it might look.

        NL len isn't going to be an issue, as long as he doesn't binmode the filehandle.

        CRLF pairs will be translated to a single LF on input, and translated back going the other way.

        But if you prefer to chomp then add the newline, that works too :)
        --
        Mike

Re: Re: simple but stuck
by lolly (Novice) on Apr 25, 2002 at 14:54 UTC
    sorry, DNA is composed of four bases A, C, T, G. i will try what you have suggested. lolly
      Can the number on the line before the string of bases ever be longer than 250 digits? If yes, then you need something more sophisticated than my suggestion. Come back to us if it doesn't work. Andy.