Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Simulating uni-dimensional Hawkes processes

by roboticus (Canon)
on Feb 18, 2013 at 03:36 UTC ( #1019268=note: print w/ replies, xml ) Need Help??


in reply to Simulating uni-dimensional Hawkes processes

grim_master:

There's too much domain-specific jargon for me to know what the heck the program is supposed to compute. However, I went through your program and removed unnecessary operations, simplifying as much as I could. (I tend to do this to figure out what programs are doing.)

In the process of playing with your program, I noticed a few things:

  • You only go through your loop twice. Glancing at the paper, I'm expecting you'll need a larger set of iterations for it to do anything useful.
  • Structurally, your program is dangerous. You use global variables which you modify in and out of your subroutine. It's very easy to get confused about what's happening, and you could be having trouble with "action at a distance". (Meaning, you may be altering a value when you're not expecting to do so.)
  • Something that will help you keep things clear is to declare variables:
    • In the smallest scope you can, and
    • only when and where you need to.
    For example: You're declaring a global $result variable, whose sole purpose is to sit inside your lambda subroutine. In there, it simply receives the sum of two values, and the next statement returns the value. I simply removed the $result variable and returned the result directly. (Later, I extracted $lambda0 from the subroutine to remove the "action at a distance".
  • Your lack of indentation makes the program impossible to read, so the first thing I had to do was fix the indentation.
  • You wrote your equations in a rather obtuse style, making them harder to look at. I trimmed them a bit.
  • In your for loop, you have an if statement to separate the initial conditions setup from the actual code. It's cleaner to simply set up the initial conditions before the loop, and adjust your starting index accordingly. That way, you don't have to look "around" the if statement each time through the loop.

I could continue, but I don't really have the time. (Nor the inclination, as I don't really understand the paper.) So the code is still a bit of a mess, but here's what I did with it. I definitely oversimplified it (by removing the $i variable, for example). You'll probably want to put that back in when you're fixing it. Now that I've properly disclaimed it, here's what I simplified it to:

#!/usr/bin/perl use strict; use warnings; use diagnostics; #Force the same set of random numbers for each run until it works... #use Math::Random::MT qw(rand); srand 0; open (F1,'>>event_times_new2.mcm'); open (F2,'>>lambda_star_new2.mcm'); my ($lambda0, $alpha, $beta, $lambda_star); my $time=8000000; my $counterfails=0; my $num=1; $lambda0=1.2; $alpha=0.8; $beta=0.9; #ALGO it self my (@d, @s, @u, @event_times); $lambda_star = $lambda0; $d[0] = 0; $u[0] = rand($num); $s[0] = -log($u[0]) / $lambda_star; die if $s[0] >= $time; $event_times[0] = $s[0]; print F1 "$event_times[0]\n"; print F2 "$lambda_star\n"; my $valuelambda = lambda($event_times[0]) + $lambda0; $lambda_star=$valuelambda+$alpha; START: $u[1]=rand($num); $s[1] = $s[0] - log($u[1])/$lambda_star; die if $s[1] > $time; $d[1]=rand($num) ; my $rejectionlambda=lambda($s[1]); if ($d[1] < $rejectionlambda/$lambda_star) { $event_times[1]=$s[1]; print F1 "$event_times[1]\n"; print F2 "$lambda_star\n"; }else{ $counterfails+=1; $lambda_star=$rejectionlambda; goto START; } print F1 "The number of event-times that failed the A/R procedure is $ +counterfails\n"; close F1; close F2; ################################################ #subroutine that computes intensity function sub lambda { my $t = $_[0]; my $term; my $sum=0; for (my $i=0; $i<@event_times; $i++) { $term = $alpha*exp( -$beta * ($t - $event_times[$i]) ); $sum += $term; } $sum -= $alpha if $term == $alpha; return $sum; }

NOTE: One trick I used when modifying your program is to use the same sequence of random numbers for each run. Since I'm just learning about your program, I didn't want to change the values returned. So by using the same sequence of random numbers each time, I could look at the results after each change and see if they changed. If so, then I made a mistake, so I back out the change.

If you're curious about any particular change(s) I made, just ask. I'll be happy to tell you why I did each one.

...roboticus

When your only tool is a hammer, all problems look like your thumb.


Comment on Re: Simulating uni-dimensional Hawkes processes
Download Code
Re^2: Simulating uni-dimensional Hawkes processes
by glrm_master (Initiate) on Feb 18, 2013 at 19:29 UTC

    thanks a lot roboticus for your help: I will look into your changes as soon as I'm done preparing tomorrows exam :((

    now, back to your coments: 1)I usually let the loop run for 10,000 or 50,000 iterations to get my arrival times (that I then use in my MLE to get the estimates for lambda0, alpha and beta)... I must have send you a latest version of the code where i was trying to figure out in only 2 iterations what the hack was wrong with my code... but you made a very good point about it!

    2)thanks for the advice about how/where to define my variables (including the whole "action at distance issue"): i'll definitely keep that in mind in my codes!

    3)I was wondering what was the best way to code an algorith that has an initiation step: thanks for the idea! in my first version of this code I did as you suggested (keeping the initiation out of the loop and than starting the loop from an according index...).. not sure why i decided to go with this way afterwords :(

    4)hahaha.. sorry for the mess in the coding!!

    I'm not sure I understand why you use the same sequence of random variables: once my algo is initiated, i generate one uniform0,1 in order to get one random variable with an exponential distribution (using the inverse function method) and one uniform0,1 to run an Acceptance/Rejection of sorts (in this case , for stockastic processes is called a "thinning procedure") to decided whether to keep or drop the exponential random variable. However, I haven't had the time yet to run/chane my code with your changes (exam tomorrow keeps me busy) but I will do so ASAP and let you know!!

    I really appreciate your help and I will get back to you as soon as i'm done! take care

    kris

      Regarding your question: Why do I use the same sequence of random variables?

      Well, when I first ran your program a couple of times, I got different results each time. Exactly as you'd expect from a program using some random numbers.

      However, I was playing with the program, I couldn't be certain I haven't broken the code. After all, I wouldn't know what values should come out. So it would be easy for me to introduce an error without noticing it. So I told the program to use the same random numbers each time. That way, I could make a change and run the program. If the output was the same, I probably didn't break it. Then I could move on to the next change. If the output was different, though, I could immediately identify (1) that I introduced an error, and (2) the error was between the last two runs. So I just keep the code in one window, and run the program in another window. That way, I could run the program after each significant change.

      It may be helpful to you for debugging to use the same sequence of random numbers: You could play along by hand with a calculator and verify that the numbers are working the way you want them to. If you find an error, you can correct it, and then run it again and verify that the correction did what you wanted, without having to kill your fingers and your calculator by starting over with a new set of numbers....

      Of course, once you get the code working, you switch back to normal random numbers.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

Re^2: Simulating uni-dimensional Hawkes processes
by glrm_master (Initiate) on Feb 18, 2013 at 21:13 UTC

    the equation I'm using for the lambda sub-routine is eq(11) on slide 15...

    is there a more efficient way of coding the summation in PERL??

      girm_master:

      I can't really tell. But first make it clear so you can debug it. Then when it's running, see if it's too slow. If so, *then* try to figure out how to speed it up.

      Normally to speed things up, you'll have to do one or more of:

      • Reduce the number of operations per iteration. A million additions will take four times as long as 250,000 additions, so if your formula was something like: $t = $alpha * $t + $beta + $gamma + $delta; where $t is your only variable, you might instead compute the sum of $beta + $gamma + $delta once before you enter the loop, and use that sum repeatedly.
      • Strength reduction: Some operations are more expensive than others--and sometimes you can convert one operation to another to reduce the amount of time taken. (See http://en.wikipedia.org/wiki/Strength_reduction).
      • Algorithm changes: You might be able to find a different algorithm to generate the same (or equivalent) results. Usually algorithmic improvements are the source of the most impressive improvements.
      • Profiling: By profiling the code, you'll see which parts are actually taking the most time, so you can focus on them. After all, if you miraculously reduce the runtime of a line of code to 0, but that line accounts for only 5% of the runtime, you've saved at best 5%. But if you halve the speed of a line of code that's consumes 50% of your runtime, you knock off 25% of your runtime.
      • Cache results: For some problems, you can save a copy of a partial result so you can use it again later.

      Putting my "old man" hat on: When I started with computers, additions were cheap (only a handful of cycles), multiplications were expensive (hundreds of cycles), and transcendental functions were *really slow*. Today, however, multiplications are nearly as inexpensive as additions, and transcendental functions aren't terribly slow, either. I have a piece of paper around here somewhere--on it, I would document the range of cycles it took to perform several operations every time I upgraded computers. It was really kind of interesting watching the cycles dropping generation after generation. CPUs are *much* more efficient than when I started, so even if today's computers ran at the same clock speed as my first computer, it would still be 100 times faster.

      The reason I'm rambling on like this: For serious optimization, you need to look over the current literature. Each decade seems to have new issues arise, while others go away. I don't work nearly so hard to remove multiplications as I used to. But back then, I didn't worry about memory speed, either. But with the ever-increasing mismatch of CPU cycle speeds and memory I/O speeds, you have to pay more attention to how you're using your cache. I haven't seriously tried to optimize a program for nearly ten years. I'm sure that my assembly-language tuning techniques from back then wouldn't be nearly as useful today. I'd have to learn new ones for the current generation of computers.

      </old-man-hat>

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1019268]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (11)
As of 2014-09-22 12:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (191 votes), past polls