lomSpace:
You first need to chop things up into related concepts. For example, the first thing I notice is that chunk you helpfully marked with =cut lines. There's a bit there you could make into its own subroutine1:
sub convert_raw_to_fasta {
my $InFileName = shift;
my $OutFileNameBase = shift or die "Missing argument(s)!";
open $hu, '<', $InFileName or die $!;
my $seq=<$hu>;
close $hu or die $!;
open $hu, '>', $OutFileNameBase . ".fa" or die $!;
print $hu ">$maid\n$seq";
close $hu or die $!;
}
Then you could simplify the remaining subroutines:
sub blast_parse{
my($maid,$maid_dir) = @_;
my $url_hu = "http://hu_seq/";
my $hu = get($url_hu.$maid);
my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa";
convert_raw_to_fasta($hu, $maid);
# syntax
# bl2seq -p blastn -i nucleotide1 -j nucleotide2 -F F -D 1
my $command = "bl2seq -p blastn -i $ltvec_small -j $hu_fa -F F -D
+1";
print $command,"\n";
open OUTPUT, '>', "$maid_dir\\".$maid."_bl2seq.out" ;
STDOUT->fdopen( \*OUTPUT, 'w');
system($command);
bl2seq_parse();
}
sub blast_hd_parse{
my($maid,$maid_dir) = @_;
my $url_hd = "http://hd_seq/";
my $hd = get($url_hd.$maid);
my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa";
convert_raw_to_fasta($hu, $maid);
my $command = "bl2seq -p blastn -i $ltvec_small -j $hd_fa -F F -D
+1";
print $command,"\n";
open OUTPUT, '>', "$maid_dir\\".$maid."_bl2seq.out" ;
STDOUT->fdopen( \*OUTPUT, 'w');
system($command);
bl2seq_parse();
}
Then you might notice that the ends of each function are similar as well. The different bits are the command string to execute and the name of the output file. Factor out those bits into arguments, and you can create another sub:
sub process_and_parse {
my $command = shift;
my $output_file = shift or die "Missing argument(s)!";
print $command,"\n";
open OUTPUT, '>', $output_file or die $!;
STDOUT->fdopen( \*OUTPUT, 'w');
system($command);
bl2seq_parse();
}
So your functions would then reduce to:
sub blast_parse{
my($maid,$maid_dir) = @_;
my $url_hu = "http://hu_seq/";
my $hu = get($url_hu.$maid);
my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa";
convert_raw_to_fasta($hu, $maid);
process_and_parse(
"bl2seq -p blastn -i $ltvec_small -j $hu_fa -F F -D 1",
"$maid_dir/" . $maid . "_bl2seq.out"
);
}
sub blast_hd_parse{
my($maid,$maid_dir) = @_;
my $url_hd = "http://hd_seq/";
my $hd = get($url_hd.$maid);
my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa";
convert_raw_to_fasta($hu, $maid);
process_and_parse(
"bl2seq -p blastn -i $ltvec_small -j $hd_fa -F F -D 1",
"$maid_dir/" . $maid . "_bl2seq.out"
);
}
You'd continue in this manner, as needed. Along the way, you'd remove unneeded bits of code and variable declarations, etc. Then, if you wanted to compress them into a single function, you'd find that again there are some bits that are different, and you could turn those differences into arguments and pull the code together.
When you're done factoring some of the bits out, sometimes you'll find that you really want to compose your system differently. Don't allow the current subroutine boundaries to constrain your thinking. Sometimes by chopping things up a bit differently, you'll wind up removing a *lot* of code and gaining features.
In fact, that's usually when I know that I understand the business process. I start thinking about things, factor out a little code here, reuse it there and there. Generalize it a little and replace several subroutines with the one. At the beginning of the process, you solve each problem as given, and you're afraid to take any liberties because you don't know the impacts on other items. Then you learn more about the system and know where to generalize. Once I have that "aha!" and start removing code while improving it, I know I'm near the end of the road.
1. I noticed that the code isn't real, functional code. So I took liberties in cleaning up a bit, making no attempt at fixing any of the code. Instead, I just added a little error handling and such. But if it were real code, the process would be similar.
2. Always (and I mean always check the result of function calls where appropriate. (open, system, get, bl2seq_parse all come to mind)
Generally, I like it when my code looks like an outline in structure. Each subroutine calls other subroutines that each to a small, simple task. You keep decomposing things until you get to something that's just trivial to implement. Something like:
# main task
initialize_frobnitz();
generate_zanzibar();
show_results();
sub initialize_frobnitz {
my $frobber = allocate_frobnitz(1);
send_to_frobber($frobber, 'INIT')
or die "Can't initialize frobber!";
send_to_frobber($frobber, 'configuration value 1')
or die "Can't configure frobber!";
}
sub send_to_frobber {
my $serial_port = ... etc ...
...roboticus
You can tell when I'm not terribly busy at work ... I tend to make longer, more rambling posts. Ah, well!
|