Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Viterbi application

by azheid (Sexton)
on Apr 06, 2014 at 19:56 UTC ( [id://1081331]=perlquestion: print w/replies, xml ) Need Help??

azheid has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I am having trouble coding the Forward Dynamic programming algorithm. I have recently updated this post to account for new information. User kcott helped me simplify the code that I had written already to make it easier to read and use. The content of this post now reflects his advice and changes. My problem is still unsolved however, and I continue to ask for help

The forward dynamic programming algorithm finds the shortest path in a connected unidirectional graph. I am playing with an NP-complete problem that can be searched with the A* algorithm, however to improve the efficiency of A*, I need to optimize the estimation function. The Forward Dynamic programming algorithm will solve the problem if I can implement it correctly.

The chunk of code below sets up an identical data structure to the program I am working on. Please help me by showing how I would implement the forward dynamic programming algorithm to find the shortest path through the data. **Disclaimer** This is not homework. You are not helping me cheat. I am a biology Ph.D. candidate who enjoys playing with perl and DNA sequence data. I have just enough coding experience to hang myself with a short rope.

my $x = [some number]; my $y = [some number]; my $distance_mat; #################### for(my $i=1; $i<=$x; ++$i){ for(my $j=0; $j<$y; ++$j){ for(my $k=0; $k<$y; ++$j){ $distance_mat -> [$i][$j][$k] = rand 1; } } } ##############Start and end nodes have zero cost for(my $i=0; $i<scalar(@{$distance_mat[1]}); ++$i){ $distance_mat -> [0][0][$i] = 0; } for(my $i=0; $i<scalar(@{$distance_mat[-1]}); ++$i){ $distance_mat -> [-1][0][$i] = 0; } ####################################### #call the recursive min_dist subroutine my $last = scalar ( @{$distance_mat} ); my @minimum_path = &min_dist (\$distance_mat,$last); ## I want @minumum_path array to store the best path for each $i level + as a pointer or index. sub min_dist { #I dont know how to program this recursive subroutine #My incorrect attempt looks like this my $k = $_[1]; my @array = @{ $_[0]->[$k] }; foreach( @array ){ $_ += min_dist( $_[0],$_[1]-1 ); } my $return=min( min_dist($_[0],$_[1]-1)+ ###I cannot figure ou +t how to write this statement return $return; } sub min { #pass array of values my @array = sort { $a <=> $b } @_; return $array[0]; }

The @distance_matrix can be interpreted as a classic trellis diagram of nodes and paths. The $i dimension is the number of horizontal slices of a trellis graph. The $j dimension is the number of nodes for each $i slice. The $k dimension is the distance cost to the next slice nodes. In this example, all nodes of each trellis slice are connected to all nodes in adjacent slices.

I have tried and failed to write a working piece of code for several days now and I need a push in the right direction. I know the min_dist subroutine is not even close to being correct. I am not sure how to write it correctly. The following website describes the algorithm I want to implement. http://www.cob.sjsu.edu/facstaff/davis_r/courses/QBAreader/dynprog.html

Replies are listed 'Best First'.
Re: Viterbi application
by kcott (Archbishop) on Apr 06, 2014 at 21:37 UTC

    G'day azheid,

    "I have just enough coding experience to hang myself with a short rope. ... I tried to create a second similar matrix that stored the minimum distance to each node, but I am having trouble with correctly referencing the distance matrix."

    I think you need some basic foundation knowledge.

    In both of these lines, you're assigning an arrayref to an array:

    my @distance_mat=[]; ... my @vit_dist=[ [ 0] ];

    And in this line:

    ${$vit_dist[$i]}[$j]=min(${$vit_dist[$i-1]}[$j]+${$distance_mat[$i]}[$ +j];

    you appear to be calling a min() function. You've failed to include a closing parenthesis so that leaves me guessing what is intended (and, potentially, what else has been left out or accidentally included). Perl has no min() function, Algorithm::Viterbi's documentation shows no min() function and you show no code for a min() function: I'm out of guesses.

    Attempting to access a multi-dimensional array element with code like:

    ${${$distance_mat[$i]}[$j]}[$k]

    is difficult to read and maintain: and, therefore, error-prone.

    I suggest starting with "Perl references short introduction" and "Perl data structures intro".

    Here's how I might have written the first piece of code you posted. I've reduced the number of elements from 10 to 2 for demonstration purposes. This should give you an idea of how you can (much simply) drill down into this sort of data structure and access individual elements.

    #!/usr/bin/env perl use strict; use warnings; my $max_index = 1; my $matrix; for my $i (0 .. $max_index) { for my $j (0 .. $max_index) { for my $k (0 .. $max_index) { $matrix->[$i][$j][$k] = rand 1; } } } use Data::Dump; dd $matrix;

    Output:

    [ [ [0.290579816865833, 0.491432556740136], [0.888187692306143, 0.761365009561764], ], [ [0.865674078359991, 0.284696276285089], [0.0562126863718184, 0.781856824452085], ], ]

    -- Ken

      You are right, your implementation of the data matrix is much cleaner. I was not aware of that notation. min() is a subroutine that I did not include. It does exactly what it sounds like, returns the minimum of two values in a list.

      I will try to implement the code with this notation and see if the algorithm is easier to implement.

        "min() is a subroutine that I did not include. It does exactly what it sounds like, returns the minimum of two values in a list."

        I did consider that might have been something like that (e.g. similar to List::Util's min() function); however, you're only providing a single value after min( in the code you posted.

        Here's an example of how ${$vit_dist[$i-1]}[$j]+${$distance_mat[$i]}[$j] might be evaluated:

        $ perl -Mstrict -Mwarnings -le ' my (@vit_dist, @distance_mat); $vit_dist[0][0] = 30; $distance_mat[1][0] = 12; my ($i, $j) = (1, 0); print ${$vit_dist[$i-1]}[$j]+${$distance_mat[$i]}[$j]; ' 42

        Perhaps you need a comma instead of a plus:

        min(${$vit_dist[$i-1]}[$j], ${$distance_mat[$i]}[$j])

        A liberalsplinklingofwhitespace may improve the readability of your code and reduce errors. :-)

        -- Ken

        Implement seems to be my favorite word today

Re: Viterbi application
by Laurent_R (Canon) on Apr 08, 2014 at 18:57 UTC
    Hi, I tried to run your first piece of code to better visualize the data, but even that part does not work properly and probably never ends. These nested loops don't work:
    for(my $i=1; $i<=$x; ++$i){ for(my $j=0; $j<$y; ++$j){ for(my $k=0; $k<$y; ++$j){ $distance_mat -> [$i][$j][$k] = rand 1; } } }
    because the third line increments $j and tests max value on $k, so that will probably never ends. A minimum correction on the first part would be:
    use strict; use warnings; use Data::Dumper; my $x = 4; my $y = 5; my $distance_mat; #################### for(my $i=1; $i<=$x; ++$i){ for(my $j=0; $j<$y; ++$j){ for(my $k=0; $k<$y; ++$k){ $distance_mat -> [$i][$j][$k] = rand 1; } } } ##############Start and end nodes have zero cost for(my $i=0; $i<scalar(@{$distance_mat->[1]}); ++$i){ $distance_mat -> [0][0][$i] = 0; } for(my $i=0; $i<scalar(@{$distance_mat->[-1]}); ++$i){ $distance_mat -> [-1][0][$i] = 0; } print Dumper $distance_mat;
    which will print out: Notice how Data::Dumper makes it possible to visualize your data structure. Looking at the above structure, it would seem (but I cannot be sure) that you would probably want to use the post increment operators, rather than the pre increment operator. Or, better, use the
    for my $i (0..$max_val)
    syntax. Please fix this part of your code so that we can understand what you are trying to do and agree on your input, then only can we think really on the rest of your program. And, at this point, you probably want to open a new thread (referring to this one) with the input part properly fixed, son that we can then focus on the rest.

      oops, sorry. I incorrectly copied the third line of that loop. In my program, the data structure is more complicated and depends on a large file. I tried to simplify the code so that the details were easier to understand.

      At this point, you are right, I should probably make a new post. I am considering asking one of my computer science colleagues to help, however in the past they have been reluctant.

        No problem, this can happen, but please make sure, before opening a new thread, that at least the part of your code that produces the input data is really correct. This way, we can try to work on your Viterbi algorithm itself with good data. For the time being, the Dumper output shows for example some inner arrays with only 0 values, I am not completely sure that's really what you want to start with. Don't hesitate to use the Dumper function of the Data::Dumper module (the way I did it) to better visualize your data and be sure that it is what you wish us to work on. Also note that the two zero-cost code segments were bugged and could not compile; I fixed them so that they would at least compile and do something, but I am not completely sure that what they do now is really what you wanted them to do. Well, in brief, please provide really good input data, then, once we have that, I am sure that some gentle monks among us will be willing to go through the Viterbi algorithm with you. (If you have trouble producing really good input data, I would suggest that you ask for help here, and create a new thread only once this part of the work is completely solved.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1081331]
Approved by kcott
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-23 21:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found