Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^4: numeric representation of string

by mhearse (Hermit)
on Aug 16, 2013 at 17:47 UTC ( #1049763=note: print w/ replies, xml ) Need Help??


in reply to Re^3: numeric representation of string
in thread numeric representation of string

I agree. My current code inserts email bodies to a compressed table. And that's it. Another simple idea I had was to break up the body by word boundary. Storing it in an array, then doing a bulk insert ignore into a unique column. Might look something like this.... although this a probably a pipe dream. But seems logical... at least based on my stunted repetitive vocabulary. Would have the benefit of being fast due to the lack of compression.

CREATE TABLE words ( rowid INT UNSIGNED NOT NULL AUTO_INCREMENT, word VARCHAR(255) NOT NULL UNIQUE ) ENGINE=InnoDB CHARACTER SET=utf8;
CREATE TABLE body ( rowid INT UNSIGNED NOT NULL AUTO_INCREMENT, word_order_num INT UNSIGNED NOT NULL, word_rowid FOREIGN KEY REFERENCES words(rowid) NOT NULL ) ENGINE=InnoDB CHARACTER SET=utf8;


Comment on Re^4: numeric representation of string
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1049763]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (9)
As of 2014-07-25 03:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (167 votes), past polls