|The stupid question is the question not asked|
Assumption: this data you are de-duping is downloaded fresh, daily from TwitFace.
The idea of loading 180 million records into a db on disk in order to de-dup it is ridiculous if you are in any way concerned with speed.
The following shows a 10-line perl script de-duping a 200-million line, 2.8 GB file of 12-digit numbers in a little over 2 1/2 minutes, using less than 30 MB of ram to do so:
That's processing the 12-digit numbers at a rate of just under 1 million per second.
You cannot even load the data into the DB at 1/100th of that rate; never mind get the de-duped back out.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.