est has asked for the wisdom of the Perl Monks concerning the following question:

Howdy fellow monks!

I have a (shamely) very long subroutine which looks ugly yet work. I've been trying to refactor the code into something more maintainable but couldn't figure out how.

Any help is appreciated!

sub produce_report { my ($self) = @_; # Initiate a hash to check if the same data has been # found earlier in this lodgement. This is to avoid # printing duplicate lines on multiple lodgements. my %check = (); # Get all lodgements and sort it... my @lodgements = $self->_get_lodgements($self->{date_str}); my @sorted_lodgements = sort { $a->{dockets} cmp $b->{dockets} } @lodgements; LODGEMENT: foreach my $lodgement (@sorted_lodgements) { # Get Local actuals... my $actual = $self->_get_Local($lodgement); # Get all job from client_job... my @jobs = $self->_get_jobs($lodgement); # Scalar to hold total outstanding and documents # lodged for this batch... my $total_lodged = 0; my $total_qcs_docs = 0; my $outstanding = 0; # Total document in all QCS for this batch... $total_qcs_docs = $self->_total_docs_in_batch(\@jobs); # Total _all_ lodged... my @all_possible_lodgements = ClientDB::Local_Lodgements->search_where({ reference => $lodgement->{reference}, hb_stream => $lodgement->{hb_stream}, }, { order_by => 'dockets' } ); # Grep only all lodgements ON OR BEFORE today... my @all_lodgements = grep { Date_Cmp( _dateformat($_->get('hb_Local_date_lodged')), _dateformat($self->{date_str}) ) <= 0 } @all_possible_lodgements; my $total_all_lodged = $self->_total_all_lodged(@all_lodgements); # Any previous lodgement for the same batch... my @previous_possible_lodgements = ClientDB::Local_Lodgements->search_where({ reference => $lodgement->{reference}, hb_stream => $lodgement->{hb_stream}, hb_Local_date_lodged => { '<', $self->{date_str} } }); # Grep only all lodgements BEFORE today... my @previous_lodgements = grep { Date_Cmp( _dateformat($_->get('hb_Local_date_lodged')), _dateformat($self->{date_str}) ) < 0 } @previous_possible_lodgements; # Total _all_ spoils for this batch... my $total_all_spoils = ClientDB::Client_Actuals ->sql_sum_hb_spoil($lodgement->{reference}) ->select_val; # Now parse through all jobs and qcs's... JOB: foreach my $job (@jobs) { # Get job QCS informations... QCS: foreach ($job->qcs) { # # Consolidation jobs... # if ( scalar(@jobs) > 1 ) { # Spoil for individual Client job... my $client_job_spoil = ClientDB::Client_Actuals->retrieve( reference => $lodgement->{reference}, cj_job_ref_no => $job->{cj_job_ref_no}, qcs_sequence_no => $_->qcs_sequence_no, ); # If the row has been recorded before # and it has no spoil then # just skip to the next one... next QCS if ( exists $check{$_->qcs_job_name} {$lodgement->{reference}} {$_->qcs_sequence_no}) && ($check{$_->qcs_job_name} {$lodgement->{reference}} {$_->qcs_sequence_no}) && ($client_job_spoil ->hb_Local_spoils == 0); # Mark that we have recorded this job... $check{$_->qcs_job_name} {$lodgement->{reference}} {$_->qcs_sequence_no} = 1; if ( $lodgement->hb_Local_Docs_Lodged == $total_qcs_docs ) { $total_lodged = $_->qcs_document_count(); $outstanding = 0; } elsif ( scalar(@all_lodgements) == 2 && ($total_all_lodged == $total_qcs_docs) ) { $outstanding = 0; if (scalar @previous_lodgements) { next QCS unless $client_job_spoil ->hb_Local_spoils > 0; $total_lodged = $client_job_spoil ->hb_Local_spoils; + } else { if ( exists $check{"$_->qcs_job_name"} {"$lodgement->{reference}"} {"$_->qcs_sequence_no"} {"lodgement"} && ($check{"$_->qcs_job_name"} {"$lodgement->{reference}"} {"$_->qcs_sequence_no"} {"lodgement"} ) > 0 ) { $total_lodged = $client_job_spoil ->hb_Local_spoils; } else { $total_lodged = $_->qcs_document_count - $client_job_spoil ->hb_Local_spoils; $check{"$_->qcs_job_name"} {"$lodgement->{reference}"} {"$_->qcs_sequence_no"} {"lodgement"} = $total_lodged; } } } elsif ( scalar(@all_lodgements) > 2 && ($total_all_lodged == $total_qcs_docs) ) { $outstanding = 0; # If there is any previous #lodgement, we report only the # one with spoil if (scalar @previous_lodgements) { next QCS unless $client_job_spoil ->hb_Local_spoils > 0; $total_lodged = '?'; } else { $total_lodged = $client_job_spoil ->hb_Local_spoils > 0 ? '?' : $_->qcs_document_count ; } } elsif ( scalar(@all_lodgements) == 1 && ($total_all_spoils > 0) && ($total_all_spoils + $total_all_lodged == $total_qcs_docs) ) { $total_lodged = $_->qcs_document_count() - $client_job_spoil ->hb_Local_spoils; $outstanding = $client_job_spoil ->hb_Local_spoils; } else { $total_lodged = '?'; $outstanding = '?'; } } else { # # Normal Client jobs... # $total_lodged = $lodgement->hb_Local_Docs_Lodged; $outstanding = $total_qcs_docs - $total_all_lodged - $actual->hb_Local_extractions; } # Generate the array of hashes of lodgement # on specific date... push @{ $self->{lodgements} }, { reference => $lodgement ->{reference}, ap_docket => $lodgement->{dockets}, lodged => $total_lodged, date => $lodgement ->hb_Local_date_lodged, time => $lodgement ->hb_Local_time_lodged, state => $actual ->hb_Local_print_location, Localno => $actual ->hb_Local_number, extractions => $actual ->hb_Local_extractions, jobname => $_->qcs_job_name, JSN => $_->qcs_sequence_no, batchno => $_->qcs_batch_no, documents => $_->qcs_document_count, formid => $_->qcs_form_id, outstanding => $outstanding, number_of_jobs => scalar(@jobs), total_all_qcs => $total_qcs_docs, total_all_lodged => $total_all_lodged, }; } } } }

Replies are listed 'Best First'.
Re: Refactor huge subroutine
by grep (Monsignor) on Aug 13, 2008 at 04:03 UTC
    A couple of comments:

    Those labels look like a good place to start. sub them out and pass the array.

    Those large, gnarly if/elsif clauses would be my next step. Create a descriptively named sub like has_many_valid_lodgements, move the logic there and return true or false. This also has the advantage of self-documenting your code. No one will be able to quickly discern what

    scalar(@all_lodgements) > 2 && ($total_all_lodged == $total_qcs_docs)

    Look for copy-and-paste code like the Date_Cmp's. Get rid of them. Make 1 sub and name it well.

    Other Notes:

    • Think about writing tests, even with the code as it stands. The tests for this will have to be high level but you'll know if you break something (or even find a bug). As you refactor add smaller tests for the smaller components. It not only will keep you from breaking things, it helps you think about refactoring. Refactored code is generally easily tested.
    • Think about self documenting the code. You seem to use decent variable names continue that with the refactored subs. Think about leaving your code to someone else, and if they'll like you in the end :)
    One dead unjugged rabbit fish later...
Re: Refactor huge subroutine
by GrandFather (Saint) on Aug 13, 2008 at 04:51 UTC

    Here's a start. Note that "global" stuff is added to a rptGlobal hash in $self to save passing around a bunch of stuff explicitly. A couple of large chunks of code have been refactored into subs. Early exits allow nested if/elsif structures to be flattened.

    Untested of course with no guarantee against introduced bugs. You do have a comprehensive test suite right?

    A little additional commenting in each of the top level if blocks in rptGenConsolidated would be nice.

    Perl reduces RSI - it saves typing
Re: Refactor huge subroutine
by kyle (Abbot) on Aug 13, 2008 at 04:15 UTC

    Some <readmore> tags would be good for that big block of code.

    I haven't looked too deeply at what this is doing, but one strategy I've heard of for this kind of situation is to turn it into an object (maybe several). You make the long-lived variables into attributes to save yourself from passing them in and out everywhere. Then you can start cutting things up into smaller and smaller routines.

    I think all of grep's suggestions are good, but I especially want to second the write tests first suggestion. Ideally, you'd have tests that Devel::Cover says exercises most of what's there.

    Good luck.

Re: Refactor huge subroutine
by apl (Monsignor) on Aug 13, 2008 at 11:06 UTC
    Duplicated (non-trivial) code should be extracted into subroutines. For example:
    Date_Cmp( _dateformat($_->get('hb_Local_date_lodged')), _dateformat($self->{date_str}) )
    could become a function.

    As a matter of personal taste, I'd turn

    exists $check{$_->qcs_job_name} {$lodgement->{reference}} {$_->qcs_sequence_no}) && ($check{$_->qcs_job_name} {$lodgement->{reference}} {$_->qcs_sequence_no}) && ($client_job_spoil ->hb_Local_spoils == 0);
    into a function. When I encounter a "next if", I don't want to dwell on the logic of the test, but to simply note that a test exists.

    Finally (and again, this is all just personal taste), consider changing

    $total_lodged = $_->qcs_document_count() - $client_job_spoil ->hb_Local_spoils;
    $total_lodged = $_->qcs_document_count() - $client_job_spoil->hb_Loc +al_spoils;
    It takes fewer lines and (if you align equal signs and operations) should be just as readable, if not more.
Re: Refactor huge subroutine
by Herkum (Parson) on Aug 13, 2008 at 14:50 UTC
    1. Move code in nested for/while loops into their own subroutine.
    2. Move the giant if/elsif conditions into their own subroutine and refactor till it makes sense.
    3. Making this into an object would simplify refactoring(for the if/elsif checks) because you can make a subroutine to do the conditional check that is more descriptive than 3 or 4 lines of if $blah == 2 and $blah2 < 2 and etc...
    4. If you have a subroutine that is longer than the size of your screen (80 rows or so) it is too long.
    5. If you have lines of code that are stretching over 80 lines in length then try another way to write the code. There is nothing worse than missing something because it goes off the screen.
Re: Refactor huge subroutine
by est (Acolyte) on Aug 14, 2008 at 05:29 UTC

    Thanks for commenting. I do have test suites in place, so it'll ensure not to break thing during refactoring.

    I'll try to put all the suggestion here to my code, however the thing that still scary me is that log if-elsif-else block. I've considered moving that into subroutines, but have no idea with the implementation.