Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Message threading

by sschneid (Deacon)
on Jun 20, 2003 at 16:20 UTC ( #267634=perlquestion: print w/ replies, xml ) Need Help??
sschneid has asked for the wisdom of the Perl Monks concerning the following question:


I'm working on message threading, and could use a bit of help. Here's a sample data structure I'm working on outputting:
Data strucure: Message ID (Parent) -- 1 (0) |-- 2 (1) \-- 3 (1) |-- 4 (3) | \-- 6 (4) \-- 5 (3)
The output doesn't really have to look exactly like that; the indenting is really the only part I'm concerned about.

So, at the moment, I have some sample data (just CSV right now, no need to be pulling from an actual database until I have this working the way I'd like) that I've put into a hash:
# Get the data and store it in a message hash while (<DATA>) { chomp(my ($id,$thread,$from,$date,$content) = split /,/); %{$msg->{$id}} = ( thread => $thread, from => $from, date => $date, content => $content ); }
So far, so good. This is the part where I'm a bit stuck as far as the best way to sort the data. Is having the message's parent ID enough? Can I efficiently output based on that? I have code (see below) to create another key in the hash (which children each message has), but I'm unsure as to whether this is really necessary or not. I figure I'll be looping through the message hash to display the output, and looping through it twice just to have the children listed in the hash might be unnecessary since we already know each message's parent. In any case, here's that code:
foreach my $id (keys %{$msg}) { next if ($msg->{$id}->{thread} == 0); push @{ $msg->{ $msg->{$id}->{thread} }->{children} }, $id; }
So, I'm stuck. Any help would be appreciated. Thanks.


Replies are listed 'Best First'.
Re: Message threading
by japhy (Canon) on Jun 20, 2003 at 16:55 UTC
    Well, one way to do it is to update the parent's hashref as you're looping through the data:
    # Get the data and store it in a message hash while (<DATA>) { chomp; my ($id, $parent, $from, $date, $content) = split /,/; $msg->{$id} = { # id => $id, # only if you feel it's necessary thread => $parent, from => $from, date => $date, content => $content, child => [], }; push @{ $msg->{$parent}[child] }, $id; # or even # push @{ $msg->{$parent}[child] }, $msg->{$id} # to save time/work/effort/etc # but then it might be necessary to store the id # in the hash itself }

    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Message threading
by BazB (Priest) on Jun 20, 2003 at 16:58 UTC

    You might find Jamie Zawinski's document on message threading interesting.

    It seems Mail::Thread and Email::Thread implement JWZ's algorithm.
    There are alternative ways of handling messages, including threading, such as Mail::Box.



    If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
    That way everyone learns.

      Yeah, I've read Jamie's document and it helps to get a basis as far as an algorythm. The problem with modules like Mail::Thread and Email::Thread is that they both use e-mail objects, which is different from the type of data I'm trying to thread (Mail::Thread, at least, only accepts Mail::Internet and Mail::Box::Message objects) -- which I'm not feeding it.


        It seems to me the hard part (threading algorithm) is already done for you as long as you make your objects look like the objects it expects. So why not find all the methods actually used by the Mail|Email::Thread module for threading and create wrapper methods in yours -- that is, implement the common interface. This way you get threading for free and the wonderful side-effect of decoupled code.

        M-x auto-bs-mode

Re: Message threading
by PotPieMan (Hermit) on Jun 21, 2003 at 20:55 UTC
    Having just the message's parent ID is indeed enough to thread messages in this manner. I have implemented an algorithm to do this in Perl and Java using a stack to simulate recursion (for walking the tree).

    The basic algorithm is as follows:

    1. Seed the stack with all top level messages, i.e. those with parent ID equal to zero. Initialize the level of each of these messages as zero.
    2. While the stack still has messages, perform the following steps
      1. Pop a message off the stack.
      2. Display the message at the appropriate level.
      3. Get all messages which are children of the current message and put them onto the stack with the level equal to one more than the current message's level.

    That's it! The tree is constructed "on the fly" through the simulated recursion.

    There are a few drawbacks to this algorithm:

    • Sorting becomes fairly difficult. If you want to display the most recent message first, you actually have to sort the data in ascending order of date. This occurs because of the tail recursion used in the algorithm (I believe - please correct me if I am wrong).
    • You have to be clever with how you display a message. You know the level, as determined by the algorithm, but indentation can sometimes be difficult.
    • The algorithm is easy to use with SQL databases, but not as easy to use with other data sources.

    It's also important to note that the way this algorithm seeds the stack is its only way to avoid getting thrown into an infinite loop. You could create two messages, each with the parent ID pointing to the other, but they will not be displayed because there is no way to reach them from the top-level (parent ID equal to zero) nodes. At least, I hope so - I haven't found a way to break it. (Please let me know if you see a fault in this.)

    Jamie Zawinski's algorithm is much more robust in the places where this one is not. If you have the time, you should definitely implement that one. :-)


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://267634]
Approved by hardburn
Front-paged by broquaint
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (10)
As of 2016-07-26 20:50 GMT
Find Nodes?
    Voting Booth?
    What is your favorite alternate name for a (specific) keyboard key?

    Results (240 votes). Check out past polls.