Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re-indexing a SQL resultset by id

by metaperl (Curate)
on Jul 08, 2011 at 17:48 UTC ( #913396=perlquestion: print w/ replies, xml ) Need Help??
metaperl has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have a query which returns a set of rows. Each row has an id field:
$VAR1 = [ { 'amount' => '58.80', 'quantity' => '1', 'product_listid' => '800000FA-1450218051', 'id' => '3938716373870073', 'customer_listid' => '8000024C-1450196328', 'datetime' => '1309966331' }, { 'amount' => '1.34', 'quantity' => '1', 'product_listid' => '800000FC-1450218057', 'id' => '3938716373870073', 'customer_listid' => '8000024C-1450196328', 'datetime' => '1309966331' }, { 'amount' => '2.14', 'quantity' => '1', 'product_listid' => '8000010C-1450218105', 'id' => '2446438717782054', 'customer_listid' => '80000245-1450196318', 'datetime' => '1310146917' }, { 'amount' => '1.34', 'quantity' => '1', 'product_listid' => '800000FC-1450218057', 'id' => '2446438717782054', 'customer_listid' => '80000245-1450196318', 'datetime' => '1310146917' } ];
Now, I want to re-index this flat set of rows by that id field as follows:
[ $id1 => [ { $hashrefs_with_id1 } ... ] $id2 => [ { $hashrefs_with_id2 } ... ] ]
I tried the map_hashes method of my favorite DBI wrapper, DBIx::Simple, but it only returns one hashref for each mapping key:
$VAR1 = { '2446438717782054' => { 'amount' => '1.34', 'quantity' => '1', 'product_listid' => '800000FC-145021 +8057', 'customer_listid' => '80000245-14501 +96318', 'datetime' => '1310146917' }, '3938716373870073' => { 'amount' => '1.34', 'quantity' => '1', 'product_listid' => '800000FC-145021 +8057', 'customer_listid' => '8000024C-14501 +96328', 'datetime' => '1309966331' } };
and I want an arrayref of them all. It's easy enough for me to write something to do this:
my @row = $r->hashes; # produce flat format of arrays of hashrefs my %row; for my $row (@row) { push @{$row{$row->{id}}}, $row; } warn Dumper(\%row);
but someone has to have needed this and written it already... maybe there's a data structure utility that does this sort of reworking of an array? UPDATEDBIx::SQLCrosstab does exactly what I want, except that it aggregates the non-keyed columns instead of making an arrayref of hashrefs out of them. For now, I will just code it.



The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development.

-- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"

Comment on Re-indexing a SQL resultset by id
Select or Download Code
Re: Re-indexing a SQL resultset by id
by sundialsvc4 (Monsignor) on Jul 08, 2011 at 22:17 UTC

    If you could arrange things so that the query specifies ORDER BY id, then your logic here would become very simple.   Every occurrence of any given id value would become, “by definition, adjacent,” and your logic could simply remember what was the value from the preceding row and notice if the value had changed.

    Perl’s “auto-vivification” features can come in very handy here.   If we have a hashref $foo which is to contain an arrayref of values found for each key, we can write very “streamlined” logic such as the following:

    while (my $row = $query->fetchrow_hashref) { my $key = $$row{'key'}; # PEDANTIC FOR CLARITY ... my $val = $$row{'value'}; push @{ $$foo{$key} }, $val; }
    (caution... extemporaneous code ... Your Mileage May Vary™)

    The “clever time-saving goodness” here being that, if $foo does not yet contain a hash-bucket for a given $key, then ... lo! ... magically, it does!   And, it is understood to be an (empty) arrayref!   And, the desired value magically gets pushed onto it!!   (How convenient can you get?)

    (In the example above,   $$foo{'bletch'}   is simply a shorthand for:   $foo->{'bletch'}  .)

    If you combine this trick with the previously mentioned ORDER BY, the process actually becomes efficient, because the $key values will be arriving in an entirely predictable sequence.   Page-faults will therefore be minimized.   Even (especially...!) if you have millions of rows in your SQL table, this will make a very noticeable difference in speed.

Re: Re-indexing a SQL resultset by id
by Juerd (Abbot) on Jul 09, 2011 at 22:28 UTC

    It seems to me that your database design (or the name of the 'id' column) is broken if you have non-unique ID's.

    As for supporting this in the module: I'm not inclined to support something in DBIx::Simple that makes the value type returned dependent on the contents of the result set. That results in rather fragile code. You could just add a method, though:

    # untested package DBIx::Simple::Result::Metaperl; our @ISA = qw(DBIx::Simple::Result); sub map_arrays_of_hashes { my ($self, $column_name) = @_; my %arrays; for my $row ($self->hashes) { push @{ $arrays{$row->{id}} }, $row; } return wantarray ? %arrays : \%arrays; } ... $db->result_class = 'DBIx::Simple::Result::Metaperl'; my %arrays_of_hashes = $db->query(...)->map_arrays_of_hashes('id');

    To keep DBIx::Simple simple, I'm not adding a proliferation of variants; a vast number map_foos_of_bars permutations could be thought of and I'm sure there's they could all be useful in someone's code somewhere, but let's draw a line... :)

    Juerd

      It seems to me that your database design (or the name of the 'id' column) is broken if you have non-unique ID's.
      FALSE:
      CREATE TABLE students id PRIMARY KEY SERIAL firstname TEXT lastname TEXT ; CREATE TABLE classes id PRIMARY KEY SERIAL name text ; CREATE TABLE student_classes studentid INT classid INT ;
      and now:
      SELECT * FROM student_classes;
      Will you get several instances of each id? Yes? Is the database design un-normalized? No. Conclusion. There are reasonable queries leading to multiple instances of certain fields by which one might want to collate results. In this case, I might want a hash keyed by student id and an array ref of all results with that student id. Call it "collation".



      The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development.

      -- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"

        SELECT * FROM student_classes;
        Will you get several instances of each id? Yes?

        Yes, but these fields are not called 'id'; here, it's just a suffix. A join, however, could result in multiple rows of the result set with the same id, even in a nice database design, so I do retract my observation about the database perhaps not being well-designed.

        Juerd

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://913396]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2014-08-29 17:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (283 votes), past polls