Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

The Threading Dilemma

by chromatic (Archbishop)
on Jul 22, 2000 at 21:42 UTC ( #23910=monkdiscuss: print w/ replies, xml ) Need Help??

Perl Monks has a feature not found on Everything2 or Slashdot, namely the ability to refer to individual posts within a threaded discussion. Because this feature is new, some of the mechanics and best practices have yet to be determined.

One difficulty arising from this feature is a namespace collision. This is most evident when dealing with two replies to the same parent writeup, but it could occur with identically named top-level postings as well. How do you refer to a writeup by name when that name occurs multiple times in the database?

I see five possible solutions. Three are behavioral and two are technical.

  • Use the special ID linking technique. This requires you to find the correct node number. It's not too difficult, but it's not as convenient as the simple bracket linking to title. However, when done properly, it's very accurate.
  • Link to the parent post of a discussion. If someone has already asked a similar question and received good answers (the post should become a Categorized Question with Answers, yes), link to the parent query and mention that the ensuing discussion brought up good conclusions. This is easier to link, if there's no title collision, and it gives a broader overview of the answers.
  • Prepend individual replies to top-level postings with a unique string, such as (kudra) or (chromatic). This makes reply titles more unique (more so than 'Re: Re: Re: foo', anyway). It has the disadvantage of propogating your identifier through any replies to your post, if replying monks are too lazy to edit the titles. It is also susceptible to collisions, if you were to reply to two same-level postings in the same discussion, and neither of the parent notes followed this rule. It's also a little ugly.
  • Create one new linking type -- a SQL-type link. Posters could specify title and author, and the Everything SQL engine could return a list of possible matches. This does require mucking about in the internal linking methods, and it does introduce more load onto the server, but if results were cached and transformed into a normal ID link, the penalty would only be there once. (Of course, having stronger search capabilities would allow posters to do this for themselves.)
  • Store threading information in the database and allow that to be linked. Writeups currently do store some threading information. There's probably a field in the question table for parent_id. (I'm guessing, as I have not seen the db tables for this particular site.)

    spudzeppelin has come up with a unique way to handle threading and sorting. Give each writeup a new 8-digit number corresponding to its place in the nesting hierarchy. The first post would be 00000000. The first reply would be 00000000x00000001, and so on. Replies inherit and extend parent threading number information. (If you're concerned about taking up space in the database, simply pack the numbers.) Sorting into a threaded model is a snap -- instead of having to create a tree, you can simply use a numerical sort to arrange things correctly. However, if you prefer to sort by score, this won't work for you.

    A similar option is to implement some sort of DOM-like model which applies to posts. You might be able to link to The Threading Dilemma:1:2 for the second reply to the first reply. (Again, your sorting order will affect this.)

After considering this, I consider the first to be the simplest and most effective. Though I have been prepending my username to my replies, I don't think it solves the underlying problem. If an XML interface to the Everything Engine becomes practical in the next year or two, hopefully the discussion will move from 'ease of displaying information within HTML limitations' to 'properly marking up data for user-determined display in XML clients', and we can be more efficient with both server resources and our own time.

Are there other solutions? Are there benefits and drawbacks I've overlooked? What do you prefer?

Comment on The Threading Dilemma
RE: The Threading Dilemma
by young perlhopper (Scribe) on Jul 22, 2000 at 22:30 UTC
    I like the first solution, but due to the inconvenience in having to figure out node numbers, maybe you could introduce some linking 'macros'. For example, if I want to link to the last node i was at, [last-node] would do it. If i'm typing at the chatterbox, [this-node] might be useful. In threaded discussions, [parent-node], [prev-sibling], [next-sibling], might be moderately useful...

    Would this be practical? I'm not sure how the architecture of perlmonks would make this easy/hard. Personally I think the [last-node] has the most potential, simply view the page you want to comment on, then go to the discussion and paste it in.


      I like this, but I'd suggest simpler macros:

      [here] [last] [top] or [parent]
RE: The Threading Dilemma
by c-era (Curate) on Jul 23, 2000 at 00:02 UTC
    Another option would be to have all the links by name converted to node number when a post is submited. If more then one node exsisted a list would come up that would allow the poster to choose the node they wanted. This would leave linking easy, but eliminate the duplicate name problem.
      This makes the most sense to me, so far.

      I do, however, think the current node_id system works very cleanly most of the time. There are a few situations (like right after submitting a post) where it is more difficult to find the node id, but [id://123456] is the only internal linking method I use, anymore. It is clean, almost always easy (just look in the location bar, or mouse-over a link to get the lastnode_id) and precise.

      There are certain situations where the id:// method does actually convert to the full <nobr><a></nobr>, so I imagine c-era's suggestion would be easiest to implement.

      I think I would encourage our fearless leader to address this issue, because the "add-my-name-to-the-title-just-to-make-it-unique" thing only serves to clutter, not clarify. kudra first posted her titling method to encourage us to make titles more informative. How much does it help you to read in Newest Nodes <nobr>"(Russ) Re: Threading"?</nobr>

      If the site could address the underlying problem causing us to do this, I think we will all be well-served.

      Brainbench 'Most Valuable Professional' for Perl

Re: The Threading Dilemma
by athomason (Curate) on Jul 23, 2000 at 00:43 UTC
    Some comments:

    • ID-linking: Each node already has a unique identifier, so why not use it? The problem is that nobody remembers node_id's, and it can be difficult to find the id of a particular node. Perhaps if this were made available somewhere on every node (inconspicuously, of course... maybe only readable if you select it?).

    • Parent Linking: This is just generally good practice when appropriate, but I don't think is solves the underlying problem. There are times when you want to refer to a particular reply, and other, more exact methods would be useful

    • Title Dirtying: Ack, this reminds me of the garbage C compilers tack on to variables before linking ;-). This may work, but it requires consistency and a fair amount of work. Node id's are a better source of uniqueness, IMO.

    • Search Linking: I think this would reduce, rather than eliminate the existing problem we have with name conflicts where the engine returns too many results to be useful. You would get fewer hits than by specifying title alone, but in the worst case, still more than the one you wanted. Besides, a real search facility like you mention and I've lobbied for before would be much more useful generally.

    • Special Thread Linking: This is fairly interesting, though 8 digits might be a bit of overkill: I doubt any node will ever have a billion replies :-). The packing could work on both server- and user-sides, but it might confuse some people. But then, Monks willing to go that far out of their way for a link could undoubtedly figure it out. Alternatively, each reply could be given a (possibly sequential) identifier unique only within the thread. So, the first reply would have a thread_id of 1, second 2, etc., regardless of depth. Of course, these numbers wouldn't be in order when displayed, but as long as they were on the node somewhere that wouldn't matter. This, too, would need a special link mechanism like the thread:n:m:... system you propose.

    When the day is over, I would prefer to just extend the existing id system, which guarantees uniqueness but doesn't expose itself sufficiently to users wanting to fully utilize it. And like I've said here and before, a good searching facility would be immensely helpful for this purpose and others.

RE: The Threading Dilemma
by tye (Cardinal) on Jul 23, 2000 at 08:35 UTC

    I really like the idea of having [title] resolved to ID number at "post" time and ambiguities prompting a list, though I'm sure this would need to be optional to avoid becoming annoying for some situations and some people. My idea of this is having "Submit" and "Preview" do this, placing the modified HTMLish into the text box, so a subsequent "Submit" or "New" will have the ID numbers. Ambiguities would show as radio buttons under the text area. So if you don't want to use this feature, just ignore it and don't resubmit. But I understand this could be some work so I won't expect it soon. A radio button for "quote the [ and ]" (making it not a link), would be nice, especially when no match is found.

    But even more than that, I really, really like replacing the current "Re: Re: Re: Re:" with "Re02.01.03: " for the third reply to the first reply to the second reply to the original question. In the rare cases of more than 99 replies, a third digit would be prepended which would mess up the sorting order unless extra work was done (which I wouldn't consider worth it).

    Then, once we have that, the "In reply to:" links would be done so that "Re02.01.03: Hi" is "In reply to: [Re][02.][01: Hi]" where "01: Hi" is the regular "parent" link, "02." is a link to "Re02: Hi" ("grandparent"), and "Re" is a link to the original question ("Hi"). I'm constantly jumping to a new node and deciding I'd like to jump up the hierarchy three steps but dislike the extra clicks and pauses required.

    The rest of the options I either didn't like, didn't understand, or didn't really care one way or the other about.

by gryng (Hermit) on Jul 23, 2000 at 08:42 UTC
    Just CTFT (Change the, er, fine title), Not to be to blunt, but I try to do this with all my posts, I pick a title for it that is relavent to what I'm saying inside the post.

    I like some of the other ideas to help out, and I realize also that alot of people don't change the title themselves, from laziness or forgetfullness (a few of my posts slip out with me forgetting to change the title).

    As a final note, one can remove the RE: clutter by simply not providing a default title for a post, and also not allowing a blank title. This, combined with id number referencing, and some other other mentioned ideas, should be a simple and effective solution for 95% of the problem, no?


    RE: CTFT
    by tye (Cardinal) on Jul 23, 2000 at 08:51 UTC

      I like CTFT, at least when you are moving the discussion in a new direction. But the problem with it is that it usually means (based on the little experience I've had with perlmonks) that the title doesn't reflect what it was in response to. But this is only a problem in places that don't show an "In reply to" item, such as in "Newest node" (where I most dislike CTFT).

      So I support CTFT (in many cases) but heavier use of it would make me push for addition of "in reply to" columns in several places.

        Well, I agree tye! I haven't been able to check the site as often as I want to, so I haven't been paying attention to the Newest Nodes section. But yeah, if you CTFT then you will probably throw people off concering that sectionl. We should definately consider fixing this problem.

        My first thought on the matter of fixing Newest Nodes though, was that we wouldn't have room to put both the title of the post, and the refering post's title on the same line. However, it would be just as useful to mention the ultimate parent's node, rather than the immediate parent's. If this was the case, then we could sort replies based on that field (the ultimate parent), and then simply note that information once for each offending reply:

        Replies: [Deep Linkage]: [Yeah but I think...] [Happy birthday!] [Arrays in Hashes]: [Use it like this:] [No no no...]

        Which would be much more sensible (you could even space the replies, and build a tree if you wanted)


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://23910]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2014-08-31 05:13 GMT
Find Nodes?
    Voting Booth?

    The best computer themed movie is:

    Results (294 votes), past polls