Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

SQL like query over iterable in perl

by pwagyi (Monk)
on Mar 02, 2017 at 07:00 UTC ( #1183352=perlquestion: print w/replies, xml ) Need Help??

pwagyi has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

Let's say I have an array of object. and also assume that there are some attributes (attr1, attr2, ...attrn) in each object.

I would like to have function/class that can count over any attributes' values. e.g if object is person, and person has attributes, sex ('Male','Female'), ethnic ('a','b','c',...), occupation ('programmer','hacker',...).

If I want to find count# on sex =>
count( array_of_objects, 'sex') => returns ( 'Male' => 45 #male count, 'Female' => 54)
But in some cases, I would also need to group,
count(array, ['sex','occupation']) => ( 'Male' => ( 'programmer' => 10, 'blah' => 34), 'Female' => ('blah' => 3, 'programmer'=>4)
How could I have generic function/class to group and count over collection?

Replies are listed 'Best First'.
Re: SQL like query over iterable in perl (updated x2)
by haukex (Bishop) on Mar 02, 2017 at 07:25 UTC

    Sounds to me like foreach (and perhaps map) and hashes will help you:

    my @array_of_objects = ( { sex=>'Male', occupation=>'foo' }, { sex=>'Female', occupation=>'foo' }, { sex=>'Male', occupation=>'bar' }, { sex=>'Female', occupation=>'bar' }, { sex=>'Female', occupation=>'foo' }, ); my %counts; $counts{ $_->{sex} }++ for @array_of_objects; my %grouped; $grouped{ $_->{sex} }{ $_->{occupation} }++ for @array_of_objects; use Data::Dumper; print Dumper( \%counts, \%grouped ); __END__ $VAR1 = { 'Female' => 3, 'Male' => 2 }; $VAR2 = { 'Female' => { 'bar' => 1, 'foo' => 2 }, 'Male' => { 'foo' => 1, 'bar' => 1 } };

    Update: Personally, I'd prefer the above, but if you really want a generic function, here's one option. Probably not the most efficient solution because it's recursive, I don't think the morning caffeine has fully kicked in yet ;-) Input and output is the same as above. (Update: huck's solution, posted before the below, is the non-recursive variation of this.)

    sub count { my ($data, $fields) = @_; $fields = [$fields] unless ref $fields; my $count = {}; _dive( $count, $_, @$fields ) for @$data; return $count; } sub _dive { my ($ref, $obj, @path) = @_; my $targ = \$ref->{ $obj->{ shift @path } }; if (!@path) { $$targ++; return $ref } $$targ = _dive( $$targ, $obj, @path ); } print Dumper( count(\@array_of_objects, 'sex') ); print Dumper( count(\@array_of_objects, ['sex','occupation']) );

    Update 2: In the above, I'm working with hash references instead of objects. If you want to use real objects and method calls, then in the first example, replace { $_->{sex} }{ $_->{occupation} } with { $_->sex }{ $_->occupation }, and in the second example, replace $obj->{ shift @path } with $obj->${\shift @path}.

Re: SQL like query over iterable in perl
by Athanasius (Archbishop) on Mar 02, 2017 at 07:33 UTC

    Hello pwagyi, and welcome to the Monastery!

    Although this is no doubt doable in OO, why reinvent the wheel? If you want SQL-like queries, use a real relational database that supports real SQL! For example, the CPAN module DBD::SQLite:

    ...includes the entire thing in the distribution. So in order to get a fast transaction capable RDBMS working for your perl project you simply have to install this module, and nothing else.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: SQL like query over iterable in perl
by huck (Prior) on Mar 02, 2017 at 07:45 UTC

    More generic than haukex's, more pure-perl than Athanasius's, more functional than OO

    #!/usr/bin/perl -w use strict; use warnings; my $ar=[ {sex=>'m',occupation=>'programmer',state=>'CA'}, {sex=>'m',occupation=>'blah1',state=>'CA'}, {sex=>'m',occupation=>'blah1',state=>'NY'}, {sex=>'m',occupation=>'blah2',state=>'NY'}, {sex=>'f',occupation=>'blah1',state=>'NY'}, {sex=>'f',occupation=>'blah2',state=>'NJ'}, ]; printer(counter($ar,['sex'])); printer(counter($ar,['sex','state'])); printer(counter($ar,['sex','occupation'])); printer(counter($ar,[])); exit; sub printer{ use Data::Dumper; my $ref=shift; local $Data::Dumper::Deepcopy=1; local $Data::Dumper::Purity=1; local $Data::Dumper::Sortkeys=1; local $Data::Dumper::Indent=2; print Dumper($ref)."\n"; } # printer sub counter { my $arin=shift; my $arparts=shift; my $res={}; my $size1=scalar(@$arparts)-1; return $res unless ($size1 >=0); my $size2=$size1-1; for my $hash (@$arin){ my @parts=(); for my $key (@$arparts){ my $val=$hash->{$key}; if (!defined ($val)) {$val='.'} push @parts,$val; } my $reshead=$res; for my $n (0..$size2) { my $val=$parts[$n]; unless ($reshead->{$val}) {$reshead->{$val}={};} $reshead=$reshead->{$val}; } $reshead->{$parts[$size1]}++; } # hash return $res; } # counter
    Result
    $VAR1 = { 'f' => 2, 'm' => 4 }; $VAR1 = { 'f' => { 'NJ' => 1, 'NY' => 1 }, 'm' => { 'CA' => 2, 'NY' => 2 } }; $VAR1 = { 'f' => { 'blah1' => 1, 'blah2' => 1 }, 'm' => { 'blah1' => 2, 'blah2' => 1, 'programmer' => 1 } }; $VAR1 = {};

    Added:

    printer(counter($ar,['state','sex','occupation']));
    Result
    $VAR1 = { 'CA' => { 'm' => { 'blah1' => 1, 'programmer' => 1 } }, 'NJ' => { 'f' => { 'blah2' => 1 } }, 'NY' => { 'f' => { 'blah1' => 1 }, 'm' => { 'blah1' => 1, 'blah2' => 1 } } };

      More generic than haukex's, more pure-perl than Athanasius's, more functional than OO

      Shorter than huck's ;-P

      sub count { my ($data, $fields) = @_; $fields = [$fields] unless ref $fields; my $count = {}; for my $obj (@$data) { my $ref = $count; $ref = $ref->{ $obj->{ $fields->[$_] } } //= {} for 0..$#$fields-1; $ref->{ $obj->{ $fields->[-1] } }++; } return $count; }

      (For input, output and other context see my other node)

        Cute

        Made me think $fields = [$fields,@_] unless ref fields;

        and not that it matters here, and i would have to look up funny debug commands to tell for sure, but are $#$fields-1 and $fields->[-1] optimized or does the subtraction happen every loop of $obj? Im never sure.

        and in mine, besides $data,$fields,and $count as nicer variable names, i had already decided i should have $size1 be named as $nterminal and $size2 named as $nsubparts to make it clearer what they did. wasnt gun'a post as a "fix" but this gave me an excuse to mention it.

        Edited to add: What about missing fields? $ref = $ref->{ $obj->{ $fields->[$_] // '.' } } //= {} where '.' is used like the SAS missing variable.
        That wasnt right, instead $ref = $ref->{ $obj->{ $fields->[$_] } // '.'  } //= {}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1183352]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2021-09-16 16:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?