http://www.perlmonks.org?node_id=761983

lomSpace has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!
I need some help to get me rolling on how to best combine
two methods into one for the sake of generalizing the code for
reuse.
sub blast_parse{ my($maid,$maid_dir) = @_; my $url_hu = "http://hu_seq/"; my $hu = get($url_hu.$maid); my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa"; #=cut # convert the raw sequence to fasta open ($hu); my $seq=<$in>; close($in); open ($hu, ">".$maid.".fa"); print $hu_fa ">$maid\n$seq"; close($hu_fa); #=cut # syntax # bl2seq -p blastn -i nucleotide1 -j nucleotide2 -F F -D 1 my $command = "bl2seq -p blastn -i $ltvec_small -j $hu_fa -F F -D +1"; print $command,"\n"; open OUTPUT, '>', "$maid_dir\\".$maid."_bl2seq.out" ; STDOUT->fdopen( \*OUTPUT, 'w'); system($command); bl2seq_parse(); } # the next method sub blast_hd_parse{ my($maid,$maid_dir) = @_; my $url_hd = "http://hd_seq/"; my $hd = get($url_hd.$maid); my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa"; # convert the raw sequence to fasta open ($hu); my $seq=<$in>; close($in); open ($hu, ">".$maid.".fa"); print $hu_fa ">$maid\n$seq"; close($hu_fa); my $command = "bl2seq -p blastn -i $ltvec_small -j $hd_fa -F F -D +1"; print $command,"\n"; open OUTPUT, '>', "$maid_dir\\".$maid."_bl2seq.out" ; STDOUT->fdopen( \*OUTPUT, 'w'); system($command); bl2seq_parse(); }
Your wisdom is always acknowledged!

Replies are listed 'Best First'.
Re: how can I combine into one method
by gwadej (Chaplain) on May 05, 2009 at 17:28 UTC

    In my experience, the question how do I combine two methods is almost always the wrong question.

    What is the real benefit you are trying to achieve? A lot of very bright people have discovered that making something reusable is more than twice as hard as making a single-purpose version. That's why many people don't worry about generalizing without 3 examples.

    If you are going to need several more of these kinds of methods, then maybe, you have a good reason for generalizing. Even then, how do I combine is not my first question.

    I normally prefer questions like:

    • Is there a chunk of functionality I can name?
    • What is common?
    • Is there some piece of this method I could use elsewhere (if only it were a separate function)?

    The second question is the easiest. The third is a variation of the second. But, the first is usually the most powerful (for me).

    I begin by breaking out one of these pieces into a new subroutine. I build testing code for it, to make certain it works the way I want it to. Then I modify the routines to use this chunk. In the process of building this new routine, I often find that parameter names and my variable names become more generic to fit with the utility nature of the method.

    Then, I replace the code in one of the original routines with a call to the new method, running our tests to verify I haven't broken anything. (You do have tests for these methods, right? If not, you need those before you start any kind of refactoring effort.) Then I can replace functionality in the next routine, and so on.

    Then, I repeat with the next chunk of code I want to break out of these methods. I apply this algorithm repeatedly until the usage of the routines becomes obvious or There's nothing more to move.

    Sometimes, I will now leave the original routines in place as a simple template for running the newly refactored code. Sometimes, I will remove the old methods and replace the places they were called with calls to the new code.

    I often find that just giving names to the internal functionality causes it to be reused elsewhere. This is a benefit above and beyond the desire to combine two methods. I can also build new variations of these methods by combining the common code with whatever unique data or pre/post-processing is needed.

    I know this isn't a direct answer to your question, but, in my experience, it's more likely to get you to better code in the end.

    G. Wade
Re: how can I combine into one method
by roboticus (Chancellor) on May 05, 2009 at 18:03 UTC

    lomSpace:

    You first need to chop things up into related concepts. For example, the first thing I notice is that chunk you helpfully marked with =cut lines. There's a bit there you could make into its own subroutine1:

    sub convert_raw_to_fasta { my $InFileName = shift; my $OutFileNameBase = shift or die "Missing argument(s)!"; open $hu, '<', $InFileName or die $!; my $seq=<$hu>; close $hu or die $!; open $hu, '>', $OutFileNameBase . ".fa" or die $!; print $hu ">$maid\n$seq"; close $hu or die $!; }

    Then you could simplify the remaining subroutines:

    sub blast_parse{ my($maid,$maid_dir) = @_; my $url_hu = "http://hu_seq/"; my $hu = get($url_hu.$maid); my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa"; convert_raw_to_fasta($hu, $maid); # syntax # bl2seq -p blastn -i nucleotide1 -j nucleotide2 -F F -D 1 my $command = "bl2seq -p blastn -i $ltvec_small -j $hu_fa -F F -D +1"; print $command,"\n"; open OUTPUT, '>', "$maid_dir\\".$maid."_bl2seq.out" ; STDOUT->fdopen( \*OUTPUT, 'w'); system($command); bl2seq_parse(); } sub blast_hd_parse{ my($maid,$maid_dir) = @_; my $url_hd = "http://hd_seq/"; my $hd = get($url_hd.$maid); my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa"; convert_raw_to_fasta($hu, $maid); my $command = "bl2seq -p blastn -i $ltvec_small -j $hd_fa -F F -D +1"; print $command,"\n"; open OUTPUT, '>', "$maid_dir\\".$maid."_bl2seq.out" ; STDOUT->fdopen( \*OUTPUT, 'w'); system($command); bl2seq_parse(); }

    Then you might notice that the ends of each function are similar as well. The different bits are the command string to execute and the name of the output file. Factor out those bits into arguments, and you can create another sub:

    sub process_and_parse { my $command = shift; my $output_file = shift or die "Missing argument(s)!"; print $command,"\n"; open OUTPUT, '>', $output_file or die $!; STDOUT->fdopen( \*OUTPUT, 'w'); system($command); bl2seq_parse(); }

    So your functions would then reduce to:

    sub blast_parse{ my($maid,$maid_dir) = @_; my $url_hu = "http://hu_seq/"; my $hu = get($url_hu.$maid); my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa"; convert_raw_to_fasta($hu, $maid); process_and_parse( "bl2seq -p blastn -i $ltvec_small -j $hu_fa -F F -D 1", "$maid_dir/" . $maid . "_bl2seq.out" ); } sub blast_hd_parse{ my($maid,$maid_dir) = @_; my $url_hd = "http://hd_seq/"; my $hd = get($url_hd.$maid); my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa"; convert_raw_to_fasta($hu, $maid); process_and_parse( "bl2seq -p blastn -i $ltvec_small -j $hd_fa -F F -D 1", "$maid_dir/" . $maid . "_bl2seq.out" ); }

    You'd continue in this manner, as needed. Along the way, you'd remove unneeded bits of code and variable declarations, etc. Then, if you wanted to compress them into a single function, you'd find that again there are some bits that are different, and you could turn those differences into arguments and pull the code together.

    When you're done factoring some of the bits out, sometimes you'll find that you really want to compose your system differently. Don't allow the current subroutine boundaries to constrain your thinking. Sometimes by chopping things up a bit differently, you'll wind up removing a *lot* of code and gaining features.

    In fact, that's usually when I know that I understand the business process. I start thinking about things, factor out a little code here, reuse it there and there. Generalize it a little and replace several subroutines with the one. At the beginning of the process, you solve each problem as given, and you're afraid to take any liberties because you don't know the impacts on other items. Then you learn more about the system and know where to generalize. Once I have that "aha!" and start removing code while improving it, I know I'm near the end of the road.

    1. I noticed that the code isn't real, functional code. So I took liberties in cleaning up a bit, making no attempt at fixing any of the code. Instead, I just added a little error handling and such. But if it were real code, the process would be similar.

    2. Always (and I mean always check the result of function calls where appropriate. (open, system, get, bl2seq_parse all come to mind)

    Generally, I like it when my code looks like an outline in structure. Each subroutine calls other subroutines that each to a small, simple task. You keep decomposing things until you get to something that's just trivial to implement. Something like:

    # main task initialize_frobnitz(); generate_zanzibar(); show_results(); sub initialize_frobnitz { my $frobber = allocate_frobnitz(1); send_to_frobber($frobber, 'INIT') or die "Can't initialize frobber!"; send_to_frobber($frobber, 'configuration value 1') or die "Can't configure frobber!"; } sub send_to_frobber { my $serial_port = ... etc ...

    ...roboticus

    You can tell when I'm not terribly busy at work ... I tend to make longer, more rambling posts. Ah, well!

      roboticus,
      That is exactly the type of example that I need in
      my path to becoming and very good perl developer :-).
      Much thanks!
Re: how can I combine into one method
by moritz (Cardinal) on May 05, 2009 at 18:09 UTC
    These look very similar, the similarity is just obscured by some comments. So let's remove the comments, and find out what's the difference:
    diff -u a b --- a 2009-05-05 19:55:21.000000000 +0200 +++ b 2009-05-05 19:54:44.000000000 +0200 @@ -1,7 +1,7 @@ -sub blast_parse{ +sub blast_hd_parse{ my($maid,$maid_dir) = @_; - my $url_hu = "http://hu_seq/"; - my $hu = get($url_hu.$maid); + my $url_hd = "http://hd_seq/"; + my $hd = get($url_hd.$maid); my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa"; open ($hu); my $seq=<$in>; @@ -9,7 +9,8 @@ open ($hu, ">".$maid.".fa"); print $hu_fa ">$maid\n$seq"; close($hu_fa); - my $command = "bl2seq -p blastn -i $ltvec_small -j $hu_fa -F F -D + 1"; + + my $command = "bl2seq -p blastn -i $ltvec_small -j $hd_fa -F F -D + 1"; print $command,"\n"; open OUTPUT, '>', "$maid_dir\\".$maid."_bl2seq.out" ; STDOUT->fdopen( \*OUTPUT, 'w');

    (A graphical diff utility like gvimdiff is also very helpful).

    The first difference is the sub name, which we'll ignore for now. The second difference is a differently named variable, and a different constant value. Let's just name the variables $url in both subs, and assume that $hd is actually a typo (it's not used anywhere below in the code).

    Then you can clean up your code a bit (do you use strict;? you know you should!) and you'll find that the subs are identical, except for the different URL, which can be provided as another parameter.

Re: how can I combine into one method
by thunders (Priest) on May 05, 2009 at 17:53 UTC
    I think you want something more like this. I'm guessing a little on some of the code. Otherwise I factored out all the common code, which as far as I can tell is everything but the url.
    sub blast_hu_parse { my($maid,$maid_dir) = @_; my $url_hu = "http://hu_seq/"; blast_parse($maid,$maid_dir,$url_hu); } sub blast_hd_parse { my($maid,$maid_dir) = @_; my $url_hd = "http://hd_seq/"; blast_parse($maid,$maid_dir,$url_hd); } sub blast_parse{ my($maid,$maid_dir,$url) = @_; my $h_file = get($url.$maid); my $ltvec_small = $maid_dir.$maid."Ltvec_small.fa"; #what is $hu where is it declared? #open ($hu); #what is $in where is it declared my $seq=<$in>; close($in); open (my $hu, ">".$maid.".fa"); print $hu ">$maid\n$seq"; close($hu); # syntax # bl2seq -p blastn -i nucleotide1 -j nucleotide2 -F F -D 1 my $command = "bl2seq -p blastn -i $ltvec_small -j $h_file -F F -D + +1"; print $command,"\n"; open OUTPUT, '>', "$maid_dir\\".$maid."_bl2seq.out" ; STDOUT->fdopen( \*OUTPUT, 'w'); system($command); bl2seq_parse(); }

    update added some comment on the file handling stuff, which appears to have some errors, after looking over the original again
Re: how can I combine into one method
by AnomalousMonk (Archbishop) on May 05, 2009 at 17:46 UTC
    After looking over the two functions a couple of times, I can't see how they differ! Could you please highlight the differences?

    One thing I do note is that the return values of the various system interface functions used (open, close, etc.) are never checked: this is almost always a bad idea.