Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Duplicate of:nnnnnn field in the Consideration Nodelet?

by BigLug (Chaplain)
on Feb 18, 2003 at 06:16 UTC ( [id://236195]=note: print w/replies, xml ) Need Help??


in reply to Duplicate of:nnnnnn field in the Consideration Nodelet?

I like the suggestions above, but offer (in the spirit of TIMTOWTDI) two alternative solutions:

When someone posts, an MD5 checksum is created. We then compare MD5 sums across the database. If something comes up, we ask "is this a duplicate?".

Dipping further into the bucket of infinite cycles, how about using String::Approx to check for similarities. This could just be compared to your own posts. On the other hand maybe the amount of allowable approx should be tighter for one's own posts.

Gee I love having infinte cycles. <grin>

  • Comment on Re: Duplicate of:nnnnnn field in the Consideration Nodelet?

Replies are listed 'Best First'.
Re: Re: Duplicate of:nnnnnn field in the Consideration Nodelet?
by BrowserUk (Patriarch) on Feb 18, 2003 at 07:09 UTC

    I was more concerned with finding a cheap mechanism for dealing with dups when they occur as I think any mechanism for preventing them is doomed to fail and would be prohibitively expensive.

    I think the main reason unintentional dups occur (from my own personal experience) is that I click submit, and then either a) my connection drops and when it is restored I am not sure if the submit was transmitted before the drop. or b) I click, and immediatly notice a typo and hit stop, edit then submit again. From my perspective as I haven't seen the screen refresh, it looks as if I was in time, but there is a window in which this can cause a dup. In this case, the MD5 wouldn't work as the text would have changed.

    As for String::Approx. Have you timed it? Quick it ain't. Even if you only used it to compare against the previous post by the submitting author, it is going to chew lots of cycles (as you indicated:). If you were trying to catch Anonymonk posts, the most frequent offenders I think, then you would have to cross check against all Anonymonk posts for say the last 10 or 15 minutes which at busy times would bring the server to its knees.


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://236195]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-23 20:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found