Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Alternatives to DB for comparable lists

by Perlbotics (Chancellor)
on May 16, 2018 at 18:32 UTC ( #1214680=note: print w/replies, xml ) Need Help??

in reply to Alternatives to DB for comparable lists

One approach might be:

  • setup a DB-Server on your collection host
  • run your MD5 tool on each host and depending on your network availability:
    • with networking: contact DB and INSERT the new data on the fly (via internal network or SSH-/VPN-tunnel)
    • w/o networking: output data line by line in a format that your DB supports for batch-loading (store in file for offline transport)
  • run your tasks on the DB

Perhaps sending the batch-lines to STDOUT is the easiest approach where the tool could even be invoked by an ssh-command issued on the collection host? That also eliminates the requirement for DB drivers on the host to be scanned.

Use a header/trailer or checksum to assert completeness/integrity of the chunk of lines transmitted and perhaps also add some interesting meta-data (creation time, IP, etc.).


Oh, you asked for DB-alternatives... Rough estimation: 750k entries with a mean entry size of ca. 500 bytes results in a total size of approx. 375 MB. My experiment with Storable resulted in a file of size 415 MB. Reading/writing took ca. 2.0/3.5s on a moderate PC (3GHz, SSD).

Merging and storing all data into a native Perl data structure and using Storable for persistence looks feasible. PRO: fast speed for analytics; CON: no luxury that comes with a DB.

  • Comment on Re: Alternatives to DB for comparable lists

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1214680]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2018-11-13 03:14 GMT
Find Nodes?
    Voting Booth?
    My code is most likely broken because:

    Results (149 votes). Check out past polls.