Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Simulating uni-dimensional Hawkes processes

by glrm_master (Initiate)
on Feb 18, 2013 at 00:23 UTC ( #1019245=perlquestion: print w/replies, xml ) Need Help??
glrm_master has asked for the wisdom of the Perl Monks concerning the following question:

Hallo everyone!

I'm a beginner in PERL and have been trying to implement in PERL a code that would simulate a uni-dimensional Hawkess process. I have found several algorithms to do so and decided to go with the one suggested by Ogata (1981) which appears on slides 21/22 of the following presentation:

the code seems to work fine but when I try to use MLE estimation to get the values of the parameters used to generate the sequence, the base intensity is always way off while the jump and decay parameters are estimated accurately... why could this be happening? is there a mistake in my code? thanks in advance for how ever will take their time to help me out! regards


I use R to perform the MLE and the following is my code in PERL:

use strict; use warnings; use diagnostics; use Data::Dumper; use List::Util qw(sum); use Math::Random::MT qw(rand); open (F1,'>>event_times_new2.csv'); open (F2,'>>lambda_star_new2.csv'); my @event_times; my @net_event_times; my @immigrants; my @new_immigrants; my @s; my @d; my @u; my @v; my $lambda0; my $alpha; my $beta; my $result; my $sum; my $lambda_star; my $time; my $counterfails=0; my $num=1; $time=8000000; $lambda0=1.2; $alpha=0.8; $beta=0.9; #subroutine that computes intensity function sub lambda { my $t = $_[0]; $sum=0; for (my $i=0; $i<@event_times; $i++) { $net_event_times[$i]=($t-$event_times[$i]); $immigrants[$i]=$alpha*exp(((-1)*$beta)*$net_event_times[$i]); } $sum= sum @immigrants; if ($immigrants[-1]==$alpha){ $sum+=(-$alpha); } $result=$lambda0+$sum; return $result; } #ALGO it self for (my $i=0; $i<2; $i++) { if ($i==0){ $lambda_star=$lambda0; $s[$i]=0; $d[$i]=0; $u[$i]=rand($num); $s[$i]=(-1)*(1/$lambda_star)*log($u[$i]); if ($s[$i]<$time){ $event_times[$i]=$s[$i]; print F1 "$event_times[$i]\n"; print F2 "$lambda_star\n"; }else{ die } }else { my $valuelambda=lambda($event_times[(($i)-1)]); $lambda_star=$valuelambda+$alpha; START: $u[$i]=rand($num); $s[$i]=$s[(($i)-1)]+(-1)*(1/$lambda_star)*log($u[$i]); if ($s[$i]>$time){ die } $d[$i]=rand($num) ; my $rejectionlambda=lambda($s[$i]); if ($d[$i]<($rejectionlambda/$lambda_star)){ $event_times[$i]=$s[$i]; print F1 "$event_times[$i]\n"; print F2 "$lambda_star\n"; }else{ $counterfails+=1; $lambda_star=$rejectionlambda; goto START; } } } print F1 "The number of event-times that failed the A/R procedure +is $counterfails\n"; close F1; close F2;

Replies are listed 'Best First'.
Re: Simulating uni-dimensional Hawkes processes
by roboticus (Chancellor) on Feb 18, 2013 at 03:36 UTC


    There's too much domain-specific jargon for me to know what the heck the program is supposed to compute. However, I went through your program and removed unnecessary operations, simplifying as much as I could. (I tend to do this to figure out what programs are doing.)

    In the process of playing with your program, I noticed a few things:

    • You only go through your loop twice. Glancing at the paper, I'm expecting you'll need a larger set of iterations for it to do anything useful.
    • Structurally, your program is dangerous. You use global variables which you modify in and out of your subroutine. It's very easy to get confused about what's happening, and you could be having trouble with "action at a distance". (Meaning, you may be altering a value when you're not expecting to do so.)
    • Something that will help you keep things clear is to declare variables:
      • In the smallest scope you can, and
      • only when and where you need to.
      For example: You're declaring a global $result variable, whose sole purpose is to sit inside your lambda subroutine. In there, it simply receives the sum of two values, and the next statement returns the value. I simply removed the $result variable and returned the result directly. (Later, I extracted $lambda0 from the subroutine to remove the "action at a distance".
    • Your lack of indentation makes the program impossible to read, so the first thing I had to do was fix the indentation.
    • You wrote your equations in a rather obtuse style, making them harder to look at. I trimmed them a bit.
    • In your for loop, you have an if statement to separate the initial conditions setup from the actual code. It's cleaner to simply set up the initial conditions before the loop, and adjust your starting index accordingly. That way, you don't have to look "around" the if statement each time through the loop.

    I could continue, but I don't really have the time. (Nor the inclination, as I don't really understand the paper.) So the code is still a bit of a mess, but here's what I did with it. I definitely oversimplified it (by removing the $i variable, for example). You'll probably want to put that back in when you're fixing it. Now that I've properly disclaimed it, here's what I simplified it to:

    #!/usr/bin/perl use strict; use warnings; use diagnostics; #Force the same set of random numbers for each run until it works... #use Math::Random::MT qw(rand); srand 0; open (F1,'>>event_times_new2.mcm'); open (F2,'>>lambda_star_new2.mcm'); my ($lambda0, $alpha, $beta, $lambda_star); my $time=8000000; my $counterfails=0; my $num=1; $lambda0=1.2; $alpha=0.8; $beta=0.9; #ALGO it self my (@d, @s, @u, @event_times); $lambda_star = $lambda0; $d[0] = 0; $u[0] = rand($num); $s[0] = -log($u[0]) / $lambda_star; die if $s[0] >= $time; $event_times[0] = $s[0]; print F1 "$event_times[0]\n"; print F2 "$lambda_star\n"; my $valuelambda = lambda($event_times[0]) + $lambda0; $lambda_star=$valuelambda+$alpha; START: $u[1]=rand($num); $s[1] = $s[0] - log($u[1])/$lambda_star; die if $s[1] > $time; $d[1]=rand($num) ; my $rejectionlambda=lambda($s[1]); if ($d[1] < $rejectionlambda/$lambda_star) { $event_times[1]=$s[1]; print F1 "$event_times[1]\n"; print F2 "$lambda_star\n"; }else{ $counterfails+=1; $lambda_star=$rejectionlambda; goto START; } print F1 "The number of event-times that failed the A/R procedure is $ +counterfails\n"; close F1; close F2; ################################################ #subroutine that computes intensity function sub lambda { my $t = $_[0]; my $term; my $sum=0; for (my $i=0; $i<@event_times; $i++) { $term = $alpha*exp( -$beta * ($t - $event_times[$i]) ); $sum += $term; } $sum -= $alpha if $term == $alpha; return $sum; }

    NOTE: One trick I used when modifying your program is to use the same sequence of random numbers for each run. Since I'm just learning about your program, I didn't want to change the values returned. So by using the same sequence of random numbers each time, I could look at the results after each change and see if they changed. If so, then I made a mistake, so I back out the change.

    If you're curious about any particular change(s) I made, just ask. I'll be happy to tell you why I did each one.


    When your only tool is a hammer, all problems look like your thumb.

      thanks a lot roboticus for your help: I will look into your changes as soon as I'm done preparing tomorrows exam :((

      now, back to your coments: 1)I usually let the loop run for 10,000 or 50,000 iterations to get my arrival times (that I then use in my MLE to get the estimates for lambda0, alpha and beta)... I must have send you a latest version of the code where i was trying to figure out in only 2 iterations what the hack was wrong with my code... but you made a very good point about it!

      2)thanks for the advice about how/where to define my variables (including the whole "action at distance issue"): i'll definitely keep that in mind in my codes!

      3)I was wondering what was the best way to code an algorith that has an initiation step: thanks for the idea! in my first version of this code I did as you suggested (keeping the initiation out of the loop and than starting the loop from an according index...).. not sure why i decided to go with this way afterwords :(

      4)hahaha.. sorry for the mess in the coding!!

      I'm not sure I understand why you use the same sequence of random variables: once my algo is initiated, i generate one uniform0,1 in order to get one random variable with an exponential distribution (using the inverse function method) and one uniform0,1 to run an Acceptance/Rejection of sorts (in this case , for stockastic processes is called a "thinning procedure") to decided whether to keep or drop the exponential random variable. However, I haven't had the time yet to run/chane my code with your changes (exam tomorrow keeps me busy) but I will do so ASAP and let you know!!

      I really appreciate your help and I will get back to you as soon as i'm done! take care


        Regarding your question: Why do I use the same sequence of random variables?

        Well, when I first ran your program a couple of times, I got different results each time. Exactly as you'd expect from a program using some random numbers.

        However, I was playing with the program, I couldn't be certain I haven't broken the code. After all, I wouldn't know what values should come out. So it would be easy for me to introduce an error without noticing it. So I told the program to use the same random numbers each time. That way, I could make a change and run the program. If the output was the same, I probably didn't break it. Then I could move on to the next change. If the output was different, though, I could immediately identify (1) that I introduced an error, and (2) the error was between the last two runs. So I just keep the code in one window, and run the program in another window. That way, I could run the program after each significant change.

        It may be helpful to you for debugging to use the same sequence of random numbers: You could play along by hand with a calculator and verify that the numbers are working the way you want them to. If you find an error, you can correct it, and then run it again and verify that the correction did what you wanted, without having to kill your fingers and your calculator by starting over with a new set of numbers....

        Of course, once you get the code working, you switch back to normal random numbers.


        When your only tool is a hammer, all problems look like your thumb.

      the equation I'm using for the lambda sub-routine is eq(11) on slide 15...

      is there a more efficient way of coding the summation in PERL??


        I can't really tell. But first make it clear so you can debug it. Then when it's running, see if it's too slow. If so, *then* try to figure out how to speed it up.

        Normally to speed things up, you'll have to do one or more of:

        • Reduce the number of operations per iteration. A million additions will take four times as long as 250,000 additions, so if your formula was something like: $t = $alpha * $t + $beta + $gamma + $delta; where $t is your only variable, you might instead compute the sum of $beta + $gamma + $delta once before you enter the loop, and use that sum repeatedly.
        • Strength reduction: Some operations are more expensive than others--and sometimes you can convert one operation to another to reduce the amount of time taken. (See
        • Algorithm changes: You might be able to find a different algorithm to generate the same (or equivalent) results. Usually algorithmic improvements are the source of the most impressive improvements.
        • Profiling: By profiling the code, you'll see which parts are actually taking the most time, so you can focus on them. After all, if you miraculously reduce the runtime of a line of code to 0, but that line accounts for only 5% of the runtime, you've saved at best 5%. But if you halve the speed of a line of code that's consumes 50% of your runtime, you knock off 25% of your runtime.
        • Cache results: For some problems, you can save a copy of a partial result so you can use it again later.

        Putting my "old man" hat on: When I started with computers, additions were cheap (only a handful of cycles), multiplications were expensive (hundreds of cycles), and transcendental functions were *really slow*. Today, however, multiplications are nearly as inexpensive as additions, and transcendental functions aren't terribly slow, either. I have a piece of paper around here somewhere--on it, I would document the range of cycles it took to perform several operations every time I upgraded computers. It was really kind of interesting watching the cycles dropping generation after generation. CPUs are *much* more efficient than when I started, so even if today's computers ran at the same clock speed as my first computer, it would still be 100 times faster.

        The reason I'm rambling on like this: For serious optimization, you need to look over the current literature. Each decade seems to have new issues arise, while others go away. I don't work nearly so hard to remove multiplications as I used to. But back then, I didn't worry about memory speed, either. But with the ever-increasing mismatch of CPU cycle speeds and memory I/O speeds, you have to pay more attention to how you're using your cache. I haven't seriously tried to optimize a program for nearly ten years. I'm sure that my assembly-language tuning techniques from back then wouldn't be nearly as useful today. I'd have to learn new ones for the current generation of computers.



        When your only tool is a hammer, all problems look like your thumb.

Re: Simulating uni-dimensional Hawkes processes
by frozenwithjoy (Priest) on Feb 18, 2013 at 04:13 UTC
    Can you give us both the expected and the actual outputs for a set of "random" numbers? Thanks.

      Hi forzenwithjoy, and thanks for your time!

      I'm not sure I understand what you are asking me for. The code is supposed to generate a sequence of "arrival times" that are supposed to be produced by a self-exciting point process with lambda as its intensity function... in order to do so, i generate some arrival times and then perform an Acceptance/Rejection of sorts that will drop some of them (with a certain probability) and generate new ones... i usually let the loop run about 10,000 or 50,000 times and than feed the generated arrival times into MLE estimation to get "back" the values of my 3 parameters: alpha, beta and lambda0 (the base intensity)... for beta and alpha i get very cloes MLE estimates while the estimate for lambda0 is ALWAYS well off... this is my problem... what kind of outputs would you like me to show you??

      thanks a lot for your help! take care!


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1019245]
Approved by Athanasius
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2017-05-25 06:29 GMT
Find Nodes?
    Voting Booth?