Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^5: how to merge many files of sorted hashes?

by GrandFather (Sage)
on Feb 03, 2012 at 00:26 UTC ( #951563=note: print w/ replies, xml ) Need Help??


in reply to Re^4: how to merge many files of sorted hashes?
in thread how to merge many files of sorted hashes?

We are about 5% closer to understanding the bigger picture so at this point I'll give up trying to figure out how best to help you and simply toss a little database code in your direction instead:

#!/usr/bin/env perl use strict; use warnings; use DBI; my $dbh = DBI->connect('dbi:SQLite:dbname=delme.sqlite', ''); $dbh->do('CREATE TABLE Bins (Xk INTEGER, Yk INTEGER, Zk INTEGER, Data +TEXT)'); my $sql = 'INSERT INTO Bins (Xk, Yk, Zk, Data) VALUES (?, ?, ?, ?)'; my $sth = $dbh->prepare ($sql); while (defined (my $data = <DATA>)) { my ($xKey, $yKey, $zKey) = split ' ', $data; chomp $data; $sth->execute((map {int} $xKey, $yKey, $zKey), $data); } $sql = 'SELECT * FROM Bins ORDER BY Xk, Yk, Zk'; $sth = $dbh->prepare($sql); $sth->execute(); while (my $row = $sth->fetchrow_hashref()) { print "$row->{Xk}, $row->{Yk}, $row->{Zk} => $row->{Data}\n"; } __DATA__ 4.941 32.586 -1.772 -44.368_23.583_-218.345_0.983_-0.012_0.005_-0.382_ +0.041_0.205 15.354 22.823 10.556 -56.368_2.583_-28.745_0.883_-0.012_0.005_-0.382_0 +.041_0.205 -0.495 12.345 98.234 -0.382_0.041_0.205_-28.745_0.883_-0.012_0.005_-0. +382_0.041

Prints:

0, 12, 98 => -0.495 12.345 98.234 -0.382_0.041_0.205_-28.745_0.883_-0. +012_0.005_-0.382_0.041 4, 32, -1 => 4.941 32.586 -1.772 -44.368_23.583_-218.345_0.983_-0.012_ +0.005_-0.382_0.041_0.205 15, 22, 10 => 15.354 22.823 10.556 -56.368_2.583_-28.745_0.883_-0.012_ +0.005_-0.382_0.041_0.205
True laziness is hard work


Comment on Re^5: how to merge many files of sorted hashes?
Select or Download Code
Re^6: how to merge many files of sorted hashes?
by andromedia33 (Novice) on Feb 03, 2012 at 16:05 UTC
    Thank you very much for your help, GrandFather. I apologize for missing the point of your question. indeed building a database seems a plausible thing to do given the large quantity of data i have.
    i have about 10,000 such input files, each consisting of a point cloud. i am constructing a hash table for each input file, so in the end i have about 10,000 hashes. (not all hash tables are huge, as most files only have about 20 points as opposed to the 100 points that cause the problem i mentioned here)
    eventually what i'd like to do with these hashes is that i will do pairwise comparison and look for common keys between each pair. that information will be used to compute a distance/dissimilarity measure between the pair of point clouds from which the pair of hash tables being compared come from. in the very end i hope to perform clustering on the 10,000 sets of point clouds.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://951563]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2015-07-05 13:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (67 votes), past polls