Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Duplicate of:nnnnnn field in the Consideration Nodelet?

by BrowserUk (Pope)
on Feb 17, 2003 at 22:09 UTC ( #236130=monkdiscuss: print w/ replies, xml ) Need Help??

There was a recent node from dws regarding accidental duplicates, and the most frequent reason for consideration is Duplicate. Unfortunately, the usual result is that both copies of the duplicate end up with replies, and both copies get considered for deletion. And tonight we has the situation of one node that had been considered, approved, and front-paged all simultaneously.

Isn't it possible to add a few interlocks somewhere?

First, if a node did not reach approved status until it had received some number greater than 1 Approval clicks. Once a node has received a certain number of approvals, it would be immune to consideration except by an editor/pmdev/god.

Then, as duplicates are so common, add a dup_id field to the consideration nodelet so that the 'other' node_id had to be supplied when considering for reasons of duplication, and once one of the pair was considered for reasons of duplication, the other would be locked out from being considered on the same basis. The check would be done upon submit rather than when the nodelet s built. Submitting a node for reason of duplication when the corresponding other node had already been so considered would reject the attempt. It should not be hard to at least verify that the 'other nodeid' supplied was by the same author, which would prevent the situation I have seen a couple of times where node with similar titles but different authors were considered as duplicates.

Seems like it might also make sense to withhold a Frontpage request from being actioned whilst a node is considered or if it is mentioned as an Other node in a duplicate node consideration.


Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Comment on Duplicate of:nnnnnn field in the Consideration Nodelet?
Re: Duplicate of:nnnnnn field in the Consideration Nodelet?
by VSarkiss (Monsignor) on Feb 17, 2003 at 23:33 UTC

    All of your suggestions sound quite reasonable, but they also demand cpu cycles and human patience ("Where's my node? Why isn't it showing up?") Many of the algorithm changes you mention have been discussed in the past, and the current scheme appears to be somewhere between "most desirable" and "least objectionable".

    Frankly, I don't think the dups are that big of a deal. If it gets by the approval/moderation process, then we have janitors as a second line of defense. I can't remember it being necessary very often, but gods can intervene too.

    ...

    Hmmm. I've noticed that every few weeks or so, the requests in Perl Monks Discussion swing from "make it faster" to "make it more functional", then back again.

    Remember when you had to go to your PHB and tell him that new feature won't fit in the existing application, that it takes more than just coding to make it work, that you only had so many resources to dedicate to this project? Well, it's sort of like that. Adding more features to prevent dupes sounds great, but it needs somebody to write it, someone else to test and apply it, somebody to write a node saying it's changed, and someone else to hang out in the CB saying "yes, we know" ;-), all on a live system. Even if we doubled all the monk salaries, I don't think they could go any faster.

    Update
    That wisecrack about "monk salaries" is a joke, OK?

      Okay. No biggy, it was just a thought. Just seems that given the volume of talented people around the Monastery it might be possible to enlist a little help with these things.

      Most of the bottleneck would seem to be caused by the lack of a development server where patches and updates could be safely tried out. That could be addressed, but I'll keep any further thoughts I have to myself if such thoughts are unwelcome.


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

        I didn't mean to sound like I was getting on your case in particular. Comments and discussion are always welcome in the monastery: they are a big part of what drives changes -- hopefully for the better -- around here. It's just that sometimes people have unrealistic expectations of what can be done by a group of volunteers (albeit talented and dedicated), and how long it takes.

Re: Duplicate of:nnnnnn field in the Consideration Nodelet?
by extremely (Priest) on Feb 17, 2003 at 23:35 UTC
    Actually, what is needed is "merge comments to..." that can pick up and move the replies.

    Or maybe we should implement something like /. has where you can't start multiple nodes within so many minutes of posting. (Anon might need IP added to it to keep from blocking multiple people posting near each other). Maybe even a short "are you posting twice? check Newest Nodes. If you are Anon, reply to your own post with updates, if logged in click the post to edit. Wait 2 minutes if you really want to start a second discussion".

    --
    $you = new YOU;
    honk() if $you->love(perl)

Re: Duplicate of:nnnnnn field in the Consideration Nodelet?
by BigLug (Chaplain) on Feb 18, 2003 at 06:16 UTC
    I like the suggestions above, but offer (in the spirit of TIMTOWTDI) two alternative solutions:

    When someone posts, an MD5 checksum is created. We then compare MD5 sums across the database. If something comes up, we ask "is this a duplicate?".

    Dipping further into the bucket of infinite cycles, how about using String::Approx to check for similarities. This could just be compared to your own posts. On the other hand maybe the amount of allowable approx should be tighter for one's own posts.

    Gee I love having infinte cycles. <grin>

      I was more concerned with finding a cheap mechanism for dealing with dups when they occur as I think any mechanism for preventing them is doomed to fail and would be prohibitively expensive.

      I think the main reason unintentional dups occur (from my own personal experience) is that I click submit, and then either a) my connection drops and when it is restored I am not sure if the submit was transmitted before the drop. or b) I click, and immediatly notice a typo and hit stop, edit then submit again. From my perspective as I haven't seen the screen refresh, it looks as if I was in time, but there is a window in which this can cause a dup. In this case, the MD5 wouldn't work as the text would have changed.

      As for String::Approx. Have you timed it? Quick it ain't. Even if you only used it to compare against the previous post by the submitting author, it is going to chew lots of cycles (as you indicated:). If you were trying to catch Anonymonk posts, the most frequent offenders I think, then you would have to cross check against all Anonymonk posts for say the last 10 or 15 minutes which at busy times would bring the server to its knees.


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://236130]
Approved by Mr. Muskrat
Front-paged by Acolyte
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2014-12-21 02:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (101 votes), past polls