go ahead... be a heretic PerlMonks

Re^4: how to merge many files of sorted hashes?

by andromedia33 (Novice)
 on Feb 02, 2012 at 22:54 UTC ( #951549=note: print w/replies, xml ) Need Help??

sorry i have not made it clearer. my actual hash looks like this:
```3_2_-1 => -44.368_23.583_-218.345_0.983_-0.012_0.005_-0.382_0.041_0.20
+5_-0.538_-0.876_0.100      -56.368_2.583_-28.745_0.883_-0.012_0.005_-
+0.382_0.041_0.205_-0.538_-0.876_0.100 ...
each element in the array of values is the entries of a 3*4 matrix, and each key can point to a few hundred of such 'matrices'. how i construct a key-value pair is as such: given a defined 3*4 matrix, i apply this transformation matrix (translation+rotation) to the coordinates of each of the 100 points, and obtain the new coor of that point, say (3.01,1.98,-0.87), which is discretized to be (3,2,-1). then (3,2,-1) is used as a key that points to such a transformation matrix.
here's an example script with simplified calculations.
```my \$input = \$ARGV[0];
open(INFILE,"\$input") or die "cannot open file \$input!\n";
my \$output = \$ARGV[1];
my %total_hash_keys=();
my %tri_hash = ();
#set bin width
my \$grid = 2;
my \$step = 0;
my \$block_size = 10000;
my \$block_no = 0;

my @points;
while(<INFILE>){my @array = split(/\t/,\$_); push @points, [@array];}
close(INFILE);

#construct hash table
for(my \$i=0;\$i<@points;\$i++){
for(my \$j=\$i+1;\$j<@points;\$j++){
for(my \$k=\$j+1;\$k<@points;\$k++){
\$step++;
my @pt1 = (\${\$points[\$i]}[0],\${\$points[\$i]}[1],\${\$points[\$i]}[2]
+);
my @pt2 = (\${\$points[\$j]}[0],\${\$points[\$j]}[1],\${\$points[\$j]}[2]
+);
my @pt3 = (\${\$points[\$k]}[0],\${\$points[\$k]}[1],\${\$points[\$k]}[2]
+;

#simplified calculation for the value of the hash;
my @matrix = (@pt1,@pt2,@pt3);
for(my \$res=0;\$res<@points;\$res++){
#transform coor, and bin the new coor as a generated key
my @old_xyz = @{\$points[\$res]};
my @new_xyz = transform(@old_xyz,@matrix);
foreach(@new_xyz){\$_ = int(\$_/\$grid); }
my \$key = \$new_xyz[0]."_".\$new_xyz[1]."_".\$new_xyz[2];
foreach(@matrix){\$_ = sprintf "%.3f",\$_;}
my \$value = "";
for(my \$temp=0;\$temp<@matrix;\$temp++){\$value .= \$matrix[\$temp]
+."_"; }
\$total_hash_keys{\$key}=0;
push @{\$tri_hash{\$key}},\$value;
}
if((\$step % \$block_size) == 0){#write to disk file
\$block_no = int(\$step/\$block_size);
my \$tmp_hash_file = "tmp_hash".\$block_no;
open(OUTFILE,">\$tmp_hash_file") or die "cannot write to file \$
+tmp_hash_file!\n";
foreach(keys %tri_hash){
print OUTFILE "\$_\t";
print OUTFILE "@{\$tri_hash{\$_}}\n";
}
%tri_hash = ();#free memory
}
}#for k
}#for j
}#for i

my \$total_file_no = int(\$step/\$block_size);

open(OUTFILE,">\$output") or die "cannot write to file \$output!\n";
while((\$my_key,\$my_value)=each %total_hash_keys){
print OUTFILE \$my_key."=>";
for(my \$i=1;\$i<\$total_file_no + 1;\$i++){
my \$hash_file = "tmp_hash".\$i; open(INFILE,"\$hash_file") or die;
while(<INFILE>){
my @array = split(/\t/,\$_);
if(\$array[0] eq \$my_key){
chomp (\$array[1]);
print OUTFILE \$array[1];
last;
}
}
close(INFILE);
}
print OUTFILE "\n";
}

sub transform{
my (\$x,\$y,\$z,@t) = @_;
my \$new_x=\$x*\$t[0]+\$y*\$t[3]+\$z*\$t[6];
my \$new_y=\$x*\$t[1]+\$y*\$t[4]+\$z*\$t[7];
my \$new_z=\$x*\$t[2]+\$y*\$t[5]+\$z*\$t[8];
return (\$new_x,\$new_y,\$new_z);
}

Replies are listed 'Best First'.
Re^5: how to merge many files of sorted hashes?
by GrandFather (Sage) on Feb 03, 2012 at 00:26 UTC

We are about 5% closer to understanding the bigger picture so at this point I'll give up trying to figure out how best to help you and simply toss a little database code in your direction instead:

```#!/usr/bin/env perl
use strict;
use warnings;
use DBI;

my \$dbh = DBI->connect('dbi:SQLite:dbname=delme.sqlite', '');

\$dbh->do('CREATE TABLE Bins (Xk INTEGER, Yk INTEGER, Zk INTEGER, Data
+TEXT)');

my \$sql = 'INSERT INTO Bins (Xk, Yk, Zk, Data) VALUES (?, ?, ?, ?)';
my \$sth = \$dbh->prepare (\$sql);

while (defined (my \$data = <DATA>)) {
my (\$xKey, \$yKey, \$zKey) = split ' ', \$data;

chomp \$data;
\$sth->execute((map {int} \$xKey, \$yKey, \$zKey), \$data);
}

\$sql = 'SELECT * FROM Bins ORDER BY Xk, Yk, Zk';
\$sth = \$dbh->prepare(\$sql);
\$sth->execute();

while (my \$row = \$sth->fetchrow_hashref()) {
print "\$row->{Xk}, \$row->{Yk}, \$row->{Zk} => \$row->{Data}\n";
}

__DATA__
4.941 32.586 -1.772 -44.368_23.583_-218.345_0.983_-0.012_0.005_-0.382_
+0.041_0.205
15.354 22.823 10.556 -56.368_2.583_-28.745_0.883_-0.012_0.005_-0.382_0
+.041_0.205
-0.495 12.345 98.234 -0.382_0.041_0.205_-28.745_0.883_-0.012_0.005_-0.
+382_0.041

Prints:

```0, 12, 98 => -0.495 12.345 98.234 -0.382_0.041_0.205_-28.745_0.883_-0.
+012_0.005_-0.382_0.041
4, 32, -1 => 4.941 32.586 -1.772 -44.368_23.583_-218.345_0.983_-0.012_
+0.005_-0.382_0.041_0.205
15, 22, 10 => 15.354 22.823 10.556 -56.368_2.583_-28.745_0.883_-0.012_
+0.005_-0.382_0.041_0.205
True laziness is hard work
Thank you very much for your help, GrandFather. I apologize for missing the point of your question. indeed building a database seems a plausible thing to do given the large quantity of data i have.
i have about 10,000 such input files, each consisting of a point cloud. i am constructing a hash table for each input file, so in the end i have about 10,000 hashes. (not all hash tables are huge, as most files only have about 20 points as opposed to the 100 points that cause the problem i mentioned here)
eventually what i'd like to do with these hashes is that i will do pairwise comparison and look for common keys between each pair. that information will be used to compute a distance/dissimilarity measure between the pair of point clouds from which the pair of hash tables being compared come from. in the very end i hope to perform clustering on the 10,000 sets of point clouds.

Create A New User
Node Status?
node history
Node Type: note [id://951549]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2018-04-27 03:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My travels bear the most uncanny semblance to ...

Results (97 votes). Check out past polls.

Notices?