Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: SQL like query over iterable in perl

by huck (Prior)
on Mar 02, 2017 at 07:45 UTC ( #1183361=note: print w/replies, xml ) Need Help??


in reply to SQL like query over iterable in perl

More generic than haukex's, more pure-perl than Athanasius's, more functional than OO

#!/usr/bin/perl -w use strict; use warnings; my $ar=[ {sex=>'m',occupation=>'programmer',state=>'CA'}, {sex=>'m',occupation=>'blah1',state=>'CA'}, {sex=>'m',occupation=>'blah1',state=>'NY'}, {sex=>'m',occupation=>'blah2',state=>'NY'}, {sex=>'f',occupation=>'blah1',state=>'NY'}, {sex=>'f',occupation=>'blah2',state=>'NJ'}, ]; printer(counter($ar,['sex'])); printer(counter($ar,['sex','state'])); printer(counter($ar,['sex','occupation'])); printer(counter($ar,[])); exit; sub printer{ use Data::Dumper; my $ref=shift; local $Data::Dumper::Deepcopy=1; local $Data::Dumper::Purity=1; local $Data::Dumper::Sortkeys=1; local $Data::Dumper::Indent=2; print Dumper($ref)."\n"; } # printer sub counter { my $arin=shift; my $arparts=shift; my $res={}; my $size1=scalar(@$arparts)-1; return $res unless ($size1 >=0); my $size2=$size1-1; for my $hash (@$arin){ my @parts=(); for my $key (@$arparts){ my $val=$hash->{$key}; if (!defined ($val)) {$val='.'} push @parts,$val; } my $reshead=$res; for my $n (0..$size2) { my $val=$parts[$n]; unless ($reshead->{$val}) {$reshead->{$val}={};} $reshead=$reshead->{$val}; } $reshead->{$parts[$size1]}++; } # hash return $res; } # counter
Result
$VAR1 = { 'f' => 2, 'm' => 4 }; $VAR1 = { 'f' => { 'NJ' => 1, 'NY' => 1 }, 'm' => { 'CA' => 2, 'NY' => 2 } }; $VAR1 = { 'f' => { 'blah1' => 1, 'blah2' => 1 }, 'm' => { 'blah1' => 2, 'blah2' => 1, 'programmer' => 1 } }; $VAR1 = {};

Added:

printer(counter($ar,['state','sex','occupation']));
Result
$VAR1 = { 'CA' => { 'm' => { 'blah1' => 1, 'programmer' => 1 } }, 'NJ' => { 'f' => { 'blah2' => 1 } }, 'NY' => { 'f' => { 'blah1' => 1 }, 'm' => { 'blah1' => 1, 'blah2' => 1 } } };

Replies are listed 'Best First'.
Re^2: SQL like query over iterable in perl
by haukex (Bishop) on Mar 02, 2017 at 08:55 UTC
    More generic than haukex's, more pure-perl than Athanasius's, more functional than OO

    Shorter than huck's ;-P

    sub count { my ($data, $fields) = @_; $fields = [$fields] unless ref $fields; my $count = {}; for my $obj (@$data) { my $ref = $count; $ref = $ref->{ $obj->{ $fields->[$_] } } //= {} for 0..$#$fields-1; $ref->{ $obj->{ $fields->[-1] } }++; } return $count; }

    (For input, output and other context see my other node)

      Cute

      Made me think $fields = [$fields,@_] unless ref fields;

      and not that it matters here, and i would have to look up funny debug commands to tell for sure, but are $#$fields-1 and $fields->[-1] optimized or does the subtraction happen every loop of $obj? Im never sure.

      and in mine, besides $data,$fields,and $count as nicer variable names, i had already decided i should have $size1 be named as $nterminal and $size2 named as $nsubparts to make it clearer what they did. wasnt gun'a post as a "fix" but this gave me an excuse to mention it.

      Edited to add: What about missing fields? $ref = $ref->{ $obj->{ $fields->[$_] // '.' } } //= {} where '.' is used like the SAS missing variable.
      That wasnt right, instead $ref = $ref->{ $obj->{ $fields->[$_] } // '.'  } //= {}

        I hope it's clear I wasn't being entirely serious or saying that my code was "better", it was just a lighthearted retort :-)

        i would have to look up funny debug commands to tell for sure, but are $#$fields-1 and $fields->[-1] optimized or does the subtraction happen every loop

        I'm not sure either, but I'd have to guess no, since the optimizer would have to know for certain that @fields won't change, and in a language as dynamic as Perl, that's probably very rarely the case, if ever. As for the debug command, I assume you're thinking of B::Deparse, invoked as perl -MO=Deparse script.pl, which would show, for example, constant folding.

        I did a quick test with Benchmark and refactored out $fields->[-1] and $#$fields-1 to before the loop, and that gave a consistent but small speed increase of ~5% (although IIRC, in terms of Benchmark results that's still in the margin of error of being insignificant).

        I didn't take missing fields into account because the OP was asking about objects, but you have a good point there, my code doesn't properly handle undef values (or an empty @$fields for that matter).

        Actually I have quite similar function that dynamically creates nested hash counter. That gives confirmation that I'm in the right direction. :) Thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1183361]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2021-09-25 05:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?