http://www.perlmonks.org?node_id=913775


in reply to renaming 1000's of FASTA files

F0Z7V0F01A03EB_210 F0Z7V0F01A03EB_180 F0Z7V0F01A03EB_136 F0Z7V0F01A03EB_362
After closer examination of the text file with id's, I see that the same id has 4 different endings, _210, _180, _136, _362. Is this so or is it only how you provided the sample input file? If it is this way, how would you choose the one to use for an id of F0Z7V0F01A03EB?

Replies are listed 'Best First'.
Re^2: renaming 1000's of FASTA files
by garyboyd (Acolyte) on Jul 12, 2011 at 08:50 UTC

    Yes that's correct, in the .txt file there are id's with 4 different endings, so the script would scan the multiple fasta file and generate (in this example) 4 files with headers:

    F0Z7V0F01A03EB_210 F0Z7V0F01A03EB_180 F0Z7V0F01A03EB_136 F0Z7V0F01A03EB_362

    and the sequence would be the same for all files, ie the sequence from the fasta file F0Z7V0F01A03EB

      You realize you have a problem here?

      You keep a hash with key the ID without the ending of "_xxx" and value the ID with the ending. But a hash-element can have only ONE value, so you overwrite the older elements with the newer, unless you take care and store the IDs in an array and then store the arrayref as a value in the hash-element.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James