Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Re^5: how to merge many files of sorted hashes?

by GrandFather (Sage)
on Feb 03, 2012 at 00:26 UTC ( #951563=note: print w/replies, xml ) Need Help??

in reply to Re^4: how to merge many files of sorted hashes?
in thread how to merge many files of sorted hashes?

We are about 5% closer to understanding the bigger picture so at this point I'll give up trying to figure out how best to help you and simply toss a little database code in your direction instead:

#!/usr/bin/env perl use strict; use warnings; use DBI; my $dbh = DBI->connect('dbi:SQLite:dbname=delme.sqlite', ''); $dbh->do('CREATE TABLE Bins (Xk INTEGER, Yk INTEGER, Zk INTEGER, Data +TEXT)'); my $sql = 'INSERT INTO Bins (Xk, Yk, Zk, Data) VALUES (?, ?, ?, ?)'; my $sth = $dbh->prepare ($sql); while (defined (my $data = <DATA>)) { my ($xKey, $yKey, $zKey) = split ' ', $data; chomp $data; $sth->execute((map {int} $xKey, $yKey, $zKey), $data); } $sql = 'SELECT * FROM Bins ORDER BY Xk, Yk, Zk'; $sth = $dbh->prepare($sql); $sth->execute(); while (my $row = $sth->fetchrow_hashref()) { print "$row->{Xk}, $row->{Yk}, $row->{Zk} => $row->{Data}\n"; } __DATA__ 4.941 32.586 -1.772 -44.368_23.583_-218.345_0.983_-0.012_0.005_-0.382_ +0.041_0.205 15.354 22.823 10.556 -56.368_2.583_-28.745_0.883_-0.012_0.005_-0.382_0 +.041_0.205 -0.495 12.345 98.234 -0.382_0.041_0.205_-28.745_0.883_-0.012_0.005_-0. +382_0.041


0, 12, 98 => -0.495 12.345 98.234 -0.382_0.041_0.205_-28.745_0.883_-0. +012_0.005_-0.382_0.041 4, 32, -1 => 4.941 32.586 -1.772 -44.368_23.583_-218.345_0.983_-0.012_ +0.005_-0.382_0.041_0.205 15, 22, 10 => 15.354 22.823 10.556 -56.368_2.583_-28.745_0.883_-0.012_ +0.005_-0.382_0.041_0.205
True laziness is hard work

Replies are listed 'Best First'.
Re^6: how to merge many files of sorted hashes?
by andromedia33 (Novice) on Feb 03, 2012 at 16:05 UTC
    Thank you very much for your help, GrandFather. I apologize for missing the point of your question. indeed building a database seems a plausible thing to do given the large quantity of data i have.
    i have about 10,000 such input files, each consisting of a point cloud. i am constructing a hash table for each input file, so in the end i have about 10,000 hashes. (not all hash tables are huge, as most files only have about 20 points as opposed to the 100 points that cause the problem i mentioned here)
    eventually what i'd like to do with these hashes is that i will do pairwise comparison and look for common keys between each pair. that information will be used to compute a distance/dissimilarity measure between the pair of point clouds from which the pair of hash tables being compared come from. in the very end i hope to perform clustering on the 10,000 sets of point clouds.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://951563]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2018-06-20 12:24 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (116 votes). Check out past polls.