Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

file header change

by utpalmtbi (Acolyte)
on Feb 05, 2013 at 09:38 UTC ( #1017091=perlquestion: print w/replies, xml ) Need Help??
utpalmtbi has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Perl Monks;

I have a multi fasta file (with header > and sequence) in the following format:

>contig_1 # 498 # 1826 # 1 # ID=1_1;partial=00;start_type=ATG;rbs_motif=AGxAGG/AGGxGG;rbs_spacer=5-10bp;gc_cont=0.406

MNLTFDYTKEPSRDVLCIDVKSFYASVECVERG LDPLKTMLVVMSNSENSGGLVLAASPM

>contig_2 # 1823 # 2173 # 1 # ID=1_2;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.311

MKQNRKEFSSYFSRSIKQNKPLYLLLMSSETNPF PRPVIGTFRGYVEENKIIIGEDSYSI

.... ...

and i want to edit the header lines as just a simple number count:

>1

MNLTFDYTKEPSRDVLCIDVKSFYASVECVERG LDPLKTMLVVMSNSENSGGLVLAASPM

>2

MKQNRKEFSSYFSRSIKQNKPLYLLLMSSETNPF PRPVIGTFRGYVEENKIIIGEDSYSI

.... ...

may be it's a too simple questions, but I only started to learn perl and got stuck..

thanks in advance..

Replies are listed 'Best First'.
Re: file header change
by tmharish (Friar) on Feb 05, 2013 at 09:52 UTC

    Where exactly are you stuck?

    Did you manage to open the file

    Can you print the contents of the file from your script

    Assuming that you can, all you need is:

    if( $line =~ />contig_(\d+)\s/ ) { print ">$1\n"; } else { print "$lin +e\n"; }
      thanks tmharish.. i used the following, but its not working.. :
      use strict; use warnings; sub read_file { my( $filename ) = <STDIN>; my @lines; sub read_file { my( $filename ) = <STDIN>; my @lines; open( FILE, "< $filename" ) or die "Can't open $filename : $!" +; { if( @line =~ />contig_(\d+)\s/ ) { print ">$1\n"; } else { print "$line\n"; } }

      :-(

        , but its not working..

        I am not surprised.

        Why do you have the same function defined twice? The second time inside the first??

        Create two functions, one to get the file name and the other to read the file, modify the contents and print it

        CALL the functions

        And finally try to run your code and get rid of the syntax errors.

Re: file header change
by Anonymous Monk on Feb 05, 2013 at 09:50 UTC

    may be it's a too simple questions, but I only started to learn perl and got stuck..

    Great, show your code

      sorry for the previous post:

      use strict; use warnings; sub read_file { my( $filename ) = <STDIN>; my @lines; sub read_file { my( $filename ) = <STDIN>; my @lines; open( FILE, "< $filename" ) or die "Can't open $filename : $!" +; { if( @line =~ />contig_(\d+)\s/ ) { print ">$1\n"; } else { print "$line\n"; } }
      use strict; use warnings; sub read_file { my( $filename ) = <STDIN>; my @lines; sub read_file { my( $filename ) = <STDIN>; my @lines; open( FILE, "< $filename" ) or die "Can't open $filename : $!"; { if( @line =~ />contig_(\d+)\s/ ) { print ">$1\n"; } else { print "$line\n"; } }
Re: file header change
by newbie1991 (Acolyte) on Feb 05, 2013 at 10:23 UTC
    This might not be the most efficient way but it is easy to understand : I am assuming you are working with several pep files, so you read the input into an array. You can split this array at the '>' with the split function, this way each of the proteins gets their own element. To manipulate this further you can split each element into multiple lines by splitting at the \n. Now, your protein info will be in the [0]th element of the array, and you can manipulate it however you want. The 1, 2, 3 will be an incrementing counter. Like I said, not the most elegant, but it's a simple enough method that I've used. Glad if it helps. :)
Re: file header change
by Kenosis (Priest) on Feb 05, 2013 at 15:26 UTC

    Try the following:

    use strict; use warnings; my $i = 1; while (<>) { s/>\K.+/$i++/e; print; }

    Output from your data:

    >1 MNLTFDYTKEPSRDVLCIDVKSFYASVECVERG LDPLKTMLVVMSNSENSGGLVLAASPM >2 MKQNRKEFSSYFSRSIKQNKPLYLLLMSSETNPF PRPVIGTFRGYVEENKIIIGEDSYSI

    Usage: perl script.pl inFile >outFile

    As the fasta inFile is read line-by-line, the regex will substitute all past the > with the value of $i. The >outFile notation directs the printing to the file outFile.

    Hope this helps!

      it works.. thank u very much..

        You're most welcome, utpalmtbi!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1017091]
Front-paged by Corion
help
Chatterbox?
[james28909]: im not quite how to explain it any better nick. you evolved from ignorance to intelligence. not the other way. the universe evolves from gas coulds and debris into planets stars and galaxies ect. it doesnt happen any other way. hence it has ....
[james28909]: some kind of logic behind it
[james28909]: and that is also anothe rpoint i made, i think it has to do with perception of the world around you. most people think of evolution on a human scale. why could life evolve on this planet? because this planet evolved in this solar system. and so on.
[holli]: here's something for you to watch, James. I think you will like it
[erix]: for the record: I have not downvoted anyone on that subthread that was my fault
[james28909]: there are all kinds of things that had to happen to let life come to be. but at the same time, life may not be the end goal IF there is any kind of end goal lol
[james28909]: well who is the person who gets to decide which behaviour is worthy of a downvote? a person with their own beliefs? xD
[erix]: teleology -- I've never understood why that was thunk up
[erix]: ( and when teleology was brought up, during my biology-study., I couldn't get an answer either )
[1nickt]: FTR I agree with you about "design." Just not sure about the trajectory of evolution. THere are a lot of dead-ends in the evolutionary paths. I suspect homo sapiens is just one of them.

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (15)
As of 2017-12-15 14:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (433 votes). Check out past polls.

    Notices?