Have you tried a hash with one entry for each message which has the message-id as the key and in-reply-to (or null) as the value? (This differs from my understanding of your original hash attempt in that it sounds like you were trying to create hash entries only for heads of threads rather than for every message.)
Once you have this hash, you can then (relatively) quickly identify which messages go with which heads:
- $hash{A} is null (or a message-id which isn't in the hash), so it's a thread head
- $hash{B} is A, and $hash{A} is a head, so it's in A's thread
- $hash{C} is B, but B isn't a head, so look at $hash{B}, which is A; A is a head, so C is in A's thread
- $hash{D} is C, but C isn't a head, so...
A touch of recursion solves that neatly with just a few hash lookups instead of rescanning the mbox. If it's not fast enough for you, though, you can also easily set up a hash where $hash2{message-id} = (message-id of the thread's head), so that you can, when you get to D, just look up $hash2{C} instead of $hash{C}, then $hash{B}, then $hash{A}.
Once you've identified the head of the thread that each message is in, you can then build the hash you originally attempted, mapping each head to an array of messages in that thread.