Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Command or Perl script for changing headers of multiple FASTA files in a specific order listed in a txt file

by Anonymous Monk
on Sep 11, 2017 at 09:31 UTC ( [id://1199075]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Assalam o alaikum everyone,

I am working with multiple genes and in each gene folder i have multiple FASTA (70-75) files and each FASTA file contains single gene sequence. e.g.

AMY2b_Gene_folder

Chimpanzee_AMY2B_CDS.fasta Human_AMY2B_CDS.fasta Pygmy_chimpanzee_AMY2B_CDS.fasta Western_gorrila_AMY2B_CDS.fasta

cat Chimpanzee_AMY2B_CDS.fasta

>lcl|NM_020978.4_cds_NP_066188.1_1 [gene=AMY2B] [protein=alpha-amylas +e 2B precursor] [protein_id=NP_066188.1] [location=673..2208] ATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCCCAAATACACAACAAG +GACGGACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATT +TAGCTCCCAAGGGATTTGGAGGGGTTCAGGTCTCTCCACCAAATGAAAATGTTGCAATTCACAACCCTT +TC

cat Human_AMY2B_CDS.fasta

>lcl|NM_020978.4_cds_NP_066188.1_1 [gene=AMY2B] [protein=alpha-amylas +e 2B precursor] [protein_id=NP_066188.1] [location=673..2208] ATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCCCAAATACACAACAAG +GACGGACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATT +TAGCTCCCAAGGGATTTGGAGGGGTTCAGGTCTCTCCACCAAATGAAAATGTTGCAATTCACAACCCTT +TC

I want to change headers of each fasta file according to a specific order given in text file.

cat Headers.txt MP.C_AMY2B FP.H_AMY2B

Desired output should be look like

>MP.C_AMY2B ATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCCCAAATACACAACAA +G GACGGACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATA +TTT AGCTCCCAAGGGATTTGGAGGGGTTCAGGTCTCTCCACCAAATGAAAATGTTGCAATTCACAACC +CTTTC

Kindly guide me is there any command-line solution to do so(which work for multiple FASTA files which have single gene sequence)????

2017-09-16 Athanasius added code tags

  • Comment on Command or Perl script for changing headers of multiple FASTA files in a specific order listed in a txt file
  • Select or Download Code

Replies are listed 'Best First'.
Re: Command or Perl script for changing headers of multiple FASTA files in a specific order listed in a txt file
by 1nickt (Canon) on Sep 11, 2017 at 09:42 UTC

    Hi, What have you tried, and how did it not work for you?

    Also, please edit your post to use <code></code> tags, as shown in the instruction directly below the Preview button. (Gonna be hard to be a programmer if you don't read the instructions.)

    Update: Hm, I missed that you posted anonymously and cannot therefore edit your post. (Thanks hippo.) Better to create a user account, in order to be able to update a post, and for other reasons...)


    The way forward always starts with a minimal test.

      well i have tried the following code:

      i put the following function in linux enviroment: and(i guess problem is here but unbable to fix it )

      function sedinho () { sed -i "s/^.*\]/>$1/g" $2;} export -f sedinho

      create variables of : list of new headers (LIST1) list of input files (LIST2)

      LIST1=($(cat Headers.txt)) LIST2=($(find /folder/with/fasta/files/ -maxdepth 0 -name "*CDS.fasta" + | sort)) parallel --xapply sedinho {1} {2} ::: ${LIST1[@]} ::: ${LIST2[@]}

      ERROR:

      zsh:1: command not found: sedinho

      zsh:1: command not found: sedinho

      zsh:1: command not found: sedinho

      how to put a function in linux environment accurately ??

        This Perl script does the first part reading the filenames and headers into an array using File::Find. How is the order of the lines in Header.txt matched to these filenames/headers ?

        #!/usr/bin/perl use strict; use File::Find; use Data::Dumper; my $folder = '/folder/with/fasta/files'; my @files=(); find( \&process, $folder ); print Dumper \@files; sub process { return unless $_ =~ /CDS\.fasta$/; my $file = $File::Find::name; open my $fh,'<',$file or die $!; my $header = <$fh>; close $fh; push @files,[$File::Find::dir,$_,$header]; }
        poj

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1199075]
Approved by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-04-16 19:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found