Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

file header change

by utpalmtbi (Acolyte)
on Feb 05, 2013 at 09:38 UTC ( #1017091=perlquestion: print w/ replies, xml ) Need Help??
utpalmtbi has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Perl Monks;

I have a multi fasta file (with header > and sequence) in the following format:

>contig_1 # 498 # 1826 # 1 # ID=1_1;partial=00;start_type=ATG;rbs_motif=AGxAGG/AGGxGG;rbs_spacer=5-10bp;gc_cont=0.406

MNLTFDYTKEPSRDVLCIDVKSFYASVECVERG LDPLKTMLVVMSNSENSGGLVLAASPM

>contig_2 # 1823 # 2173 # 1 # ID=1_2;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.311

MKQNRKEFSSYFSRSIKQNKPLYLLLMSSETNPF PRPVIGTFRGYVEENKIIIGEDSYSI

.... ...

and i want to edit the header lines as just a simple number count:

>1

MNLTFDYTKEPSRDVLCIDVKSFYASVECVERG LDPLKTMLVVMSNSENSGGLVLAASPM

>2

MKQNRKEFSSYFSRSIKQNKPLYLLLMSSETNPF PRPVIGTFRGYVEENKIIIGEDSYSI

.... ...

may be it's a too simple questions, but I only started to learn perl and got stuck..

thanks in advance..

Comment on file header change
Re: file header change
by Anonymous Monk on Feb 05, 2013 at 09:50 UTC

    may be it's a too simple questions, but I only started to learn perl and got stuck..

    Great, show your code

      use strict; use warnings; sub read_file { my( $filename ) = <STDIN>; my @lines; sub read_file { my( $filename ) = <STDIN>; my @lines; open( FILE, "< $filename" ) or die "Can't open $filename : $!"; { if( @line =~ />contig_(\d+)\s/ ) { print ">$1\n"; } else { print "$line\n"; } }
      sorry for the previous post:

      use strict; use warnings; sub read_file { my( $filename ) = <STDIN>; my @lines; sub read_file { my( $filename ) = <STDIN>; my @lines; open( FILE, "< $filename" ) or die "Can't open $filename : $!" +; { if( @line =~ />contig_(\d+)\s/ ) { print ">$1\n"; } else { print "$line\n"; } }
Re: file header change
by tmharish (Friar) on Feb 05, 2013 at 09:52 UTC

    Where exactly are you stuck?

    Did you manage to open the file

    Can you print the contents of the file from your script

    Assuming that you can, all you need is:

    if( $line =~ />contig_(\d+)\s/ ) { print ">$1\n"; } else { print "$lin +e\n"; }
      thanks tmharish.. i used the following, but its not working.. :
      use strict; use warnings; sub read_file { my( $filename ) = <STDIN>; my @lines; sub read_file { my( $filename ) = <STDIN>; my @lines; open( FILE, "< $filename" ) or die "Can't open $filename : $!" +; { if( @line =~ />contig_(\d+)\s/ ) { print ">$1\n"; } else { print "$line\n"; } }

      :-(

        , but its not working..

        I am not surprised.

        Why do you have the same function defined twice? The second time inside the first??

        Create two functions, one to get the file name and the other to read the file, modify the contents and print it

        CALL the functions

        And finally try to run your code and get rid of the syntax errors.

Re: file header change
by newbie1991 (Acolyte) on Feb 05, 2013 at 10:23 UTC
    This might not be the most efficient way but it is easy to understand : I am assuming you are working with several pep files, so you read the input into an array. You can split this array at the '>' with the split function, this way each of the proteins gets their own element. To manipulate this further you can split each element into multiple lines by splitting at the \n. Now, your protein info will be in the [0]th element of the array, and you can manipulate it however you want. The 1, 2, 3 will be an incrementing counter. Like I said, not the most elegant, but it's a simple enough method that I've used. Glad if it helps. :)
Re: file header change
by Kenosis (Priest) on Feb 05, 2013 at 15:26 UTC

    Try the following:

    use strict; use warnings; my $i = 1; while (<>) { s/>\K.+/$i++/e; print; }

    Output from your data:

    >1 MNLTFDYTKEPSRDVLCIDVKSFYASVECVERG LDPLKTMLVVMSNSENSGGLVLAASPM >2 MKQNRKEFSSYFSRSIKQNKPLYLLLMSSETNPF PRPVIGTFRGYVEENKIIIGEDSYSI

    Usage: perl script.pl inFile >outFile

    As the fasta inFile is read line-by-line, the regex will substitute all past the > with the value of $i. The >outFile notation directs the printing to the file outFile.

    Hope this helps!

      it works.. thank u very much..

        You're most welcome, utpalmtbi!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1017091]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (13)
As of 2014-12-22 13:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (118 votes), past polls