Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

seq-convert

by biosysadmin (Deacon)
on Feb 23, 2004 at 20:28 UTC ( #331198=sourcecode: print w/ replies, xml ) Need Help??

Category: Utility Scripts
Author/Contact Info Tex Thompson tex@biosysadmin.com
Description: A quick and dirty program that uses the BioPerl SeqIO modules to convert biological sequence data.

seq-convert [options] input-file options: --input <inputformat> --output <outputformat> --formats --subseq <range> --help OPTIONS --input Specifies the format of the input file. Defaults to fasta. --output Specifies the output format. Defaults to fasta. --print-formats Prints the sequence file formats available to this program. --subsequence range Selects a subsequence of the sequence contained in the input + file. Ranges should have the form x-y, where x and y are positive in +tegers. --help Prints a detailed help message. --version Prints version information.
#!/usr/bin/perl -w

use strict;
use Getopt::Long;
use Bio::SeqIO;
use File::Basename;
use Pod::Usage;

# seq-convert
# Copyright 2003 Tex Thompson <tex@mail.rit.edu>
# This is free software released under the Perl artistic license, see 
+the
# RIT Package website at http://bioinformatics.rit.edu/~tex/ritpackage
+/ for
# more information.

$0 = basename $0;
our @valid_formats = qw( fasta genbank embl gcg swiss );
# Bioperl supports more valid file formats, but they are currently unt
+ested

my $formats = "Valid sequence file formats:\n";
map { $formats .= "\t$_\n" } @valid_formats;

# parse the command line options
my ( $input, $output, $subseq, $help, $print_formats );

GetOptions( 'input=s',  \$input,
            'output=s', \$output,
            'help',     \$help,
            'subseq=s', \$subseq,
            'formats',  \$print_formats );

if ( !$output ) { $output = 'fasta' };

# die and print some information if appropriate
if ( $print_formats ) { die $formats };
if ( $help ) { pod2usage( -verbose => 2 ) };
die pod2usage( -verbose => 0 ) unless @ARGV;

# if a range is provided, make sure that it is valid

if ( $subseq ) {
   validate_range( $subseq );
}

# create input/output objects using Bio::SeqIO

my $infile = $ARGV[0];
my ($in, $out);

eval {
   if ( $input ) {
      $in = Bio::SeqIO->new( -file => $infile, -format => $input );
   } else {
      $in = Bio::SeqIO->new( -file => $infile );
   }
};

# catch exceptions from creating Bio::SeqIO input object
if ( $@ ) { print "Couldn't open file $infile\: $!\n";exit(1) };

eval {
   $out = Bio::SeqIO->new( -fh=>\*STDOUT, -format => $output );
};

# catch exceptions from creating Bio::SeqIO input object
if ( $@ ) { print "Error using format $output: $!\n"; exit(1) };

if ( $subseq ) {
   my ($start,$end) = split /\-/, $subseq;
   my $seqobj = $in->next_seq();
   print $seqobj->subseq( $start, $end );
} else {
   while ( my $seq = $in->next_seq() ) {
      $out->write_seq($seq);
   }
}

print "\n";

###############
# Subroutines #
###############

sub validate_range {
   my $range = shift;
   my $invalid_range = 1;

   if ( $range =~ /\d+-\d+/ ) {
      $invalid_range = 0;
   } else {
      # clean up error handling around here
      $invalid_range = 1;
      print "Bad range: $_\n";
   }
}

#####################                                                 
+         
# Usage Information #                                                 
+         
#####################                                                 
+         
                                                                      
+         
=head1 NAME                                                           
+         
                                                                      
+         
seq-convert - Conversion of biological sequence files.


=head1 SYNOPSIS

seq-convert [options] input-file                                      
+    

 options:                                                             
+           
   --input <inputformat>
   --output <outputformat>
   --formats
   --subseq <range>
   --help

=head1 OPTIONS

=over 8                                                               
+         
                                                                      
+         
=item B<  --input>                                                    
+       

Specifies the format of the input file. Defaults to fasta.

=item B<  --output>

Specifies the output format. Defaults to fasta.

=item B<  --print-formats>

Prints the sequence file formats available to this program.
                                                                      
+         
=item B<  --subsequence range>

Selects a subsequence of the sequence contained in the input file. Ran
+ges
should have the form x-y, where x and y are positive integers.

=item B<  --help>

Prints a detailed help message.

=item B<  --version>

Prints version information.

=back                                                              

=head1 EXAMPLES                                                       
+         

 # print a help message
 $ seq-convert --help

 # convert mySeq.fasta to a GCG formatted file
 $ seq-convert --input fasta --output gcg mySeq.fasta

 # convert the first 100 nucleotides from mySequence.genbank
 # into a fasta formatted file
 $ seq-convert --input genbank --output fasta mySequence.genbank
 $ seq-convert --input genbank mySequence.genbank

=head1 DESCRIPTION                                                    
+         

Part of the RIT Bioinformatics Package:
http://bioinformatics.sourceforge.net

This program reads a sequence from a file, converts it to another form
+at and
prints the converted file to standard output.

=head1 AUTHOR                                                         
+         
                                                                      
+         
Tex Thompson <tex@bioinformatics.rit.edu>

=head1 LICENSE                                                        
+         
                                                                      
+         
B<seq-convert> is licensed under the GNU GPL license, available from
http://www.gnu.org/.

=cut

Comment on seq-convert
Download Code

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://331198]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (12)
As of 2015-07-07 15:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (91 votes), past polls