Get top N rows in each group with DBIx::Class

1nickt has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Get top N rows in each group with DBIx::Class by NetWallah (Canon) on Nov 26, 2017 at 01:35 UTC
Try using the 'rows' attribute, like this: `my $rs = $schema->resultset('Artist')->search( undef, { rows => 10, } );` [download] The Documentation says: rows Value: $rows Specifies the maximum number of rows for direct retrieval or the number of rows per page if the page attribute or method is used. All power corrupts, but we need electricity.	[reply] [d/l]
Re^2: Get top N rows in each group with DBIx::Class by 1nickt (Canon) on Nov 26, 2017 at 03:18 UTC
Hi NetWallah, thanks for your reply. If I understand correctly `rows` basically implements a `LIMIT` on the query, which isn't quite what I need here. I'm looking for a limit per artist, or, maybe a better example, I want to query for all tracks in the library but limit the results returned to N per CD. On a related note, can you say when it's best to use the `rows` attribute on a `search()` call, and when it's best to use a `slice()` call instead? The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re: Get top N rows in each group with DBIx::Class by shadowsong (Pilgrim) on Nov 26, 2017 at 18:30 UTC
Hi 1nickt, Until one of the of the other monks finds us a more elegant solution – try Arbitrary SQL through a custom ResultSource. See e.g. below My::Schema `package My::Schema; use base qw ( DBIx::Class::Schema ); # load My::Schema::Result[Set]::* classes __PACKAGE__->load_namespaces(); 1;` [download] My::Schema::Result::CD package My::Schema::Result::CD; use strict; use warnings; use base qw ( DBIx::Class::Core ); __PACKAGE__->table_class( 'DBIx::Class::ResultSource::View' ); # specify components __PACKAGE__->load_components( qw/ Ordered / ); __PACKAGE__->position_column( 'title' ); # specify table for Result class __PACKAGE__->table( 'cd_view' ); # add columns to 'this' class __PACKAGE__->add_columns( qw/ RN cdid artistid title year / ); # specify primary key(s) __PACKAGE__->set_primary_key( qw/ cdid artistid/ ); # # specify relationships etc.. # # do not attempt to deploy() this view __PACKAGE__->result_source_instance->is_virtual(1); __PACKAGE__->result_source_instance ->view_definition(q[ SELECT ROW_NUMBER() OVER (PARTITION BY artistid ORDER BY cdid) AS RN ,cdid ,artistid ,title ,year FROM cd ]); 1; [download] My::Schema::ResultSet::CD `package My::Schema::ResultSet::CD; use strict; use warnings; use base qw ( DBIx::Class::ResultSet ); 1;` [download] Demo Notice the `{ 'RN' => { '<=', 2 } }` parameter in the search call below? You would change this to 5… `#!perl -sl use lib qw ( c:\code\perl ); use strict; use warnings; use My::Schema; use constant { DBI_DSN => 'dbi:ODBC:driver={SQL Server};Server=(local);database +=example;Trusted Connection=yes', DBI_USER => undef, DBI_PASS => undef, }; my $eg_schema = My::Schema->connect(DBI_DSN,DBI_USER,DBI_PASS, { RaiseError=>1, AutoCommit=> 1 }); my $cd_rs = $eg_schema->resultset('CD') ->search({ 'RN' => { '<=', 2 } }); print $_->RN.",".$_->cdid.",".$_->title while $_ = $cd_rs->next; __END__` [download] It'll get the job done but I wasn't able to get too creative and start using CTEs and such; your mileage may vary... Cheers, Shadowsong	[reply] [d/l] [select]
Re^2: Get top N rows in each group with DBIx::Class by 1nickt (Canon) on Nov 26, 2017 at 20:22 UTC
Hi Shadowsong, thanks very much for your reply and demo. I had seen that part of the doc and it looks promising, but the problem I am facing is that this filter ('N per "group"') must be applied after some other search calls on the RS. Continuing the metaphor, this would be something like: Restrict Artists to given musical genre Restrict Albums to those release since given date Restrict results to maximum N Albums per Artist In my app I have the first two steps handled with methods in `My::Schema::ResultSet::CD`: `sub by_genre { $_[0]->search({ 'me.genre' => $_[1] }); sub since { $_[0]->search({ 'me.released' => { '>=', $_[1] } });` [download] So the consuming code can do: `my $rs = $schema->resultset('CD'); if ( my $genre = param->{'genre'} ) { $rs = $rs->by_genre( $genre ); } if ( my $since = param->{'since'} ) { $rs = $rs->since( $since ); }` [download] I need to add a method that can act upon the existing ResultSet, `if ( my $limit = param->{'limit'} ) { $rs = $rs->limit_per_artist( $limit ); }` [download] It's not clear to me that the doc you linked to and demoed can be adapted to search an existing RS, but I'll keep experimenting. Thanks again. The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re: Get top N rows in each group with DBIx::Class by chacham (Prior) on Nov 27, 2017 at 12:40 UTC
I'm not understanding what you are trying to do. Why not use SQL? If you do not want to use a fancy SQL statement in your perl, why not create a view, and just query that instead. BTW, another way to do it, is with UNION ALL and subqueries. That is (simplified): `select group, (select member where order = 1) union all select group, (select member where order = 2) union all ...` [download] For only 5 of each, this is a pretty simple way that should work in any database. It can be done without UNION ALL by using sub-queries for your tables, and so on.	[reply] [d/l]
Re^2: Get top N rows in each group with DBIx::Class by 1nickt (Canon) on Nov 27, 2017 at 13:07 UTC
Hi chacham, thank you for your reply. I'm not understanding what you are trying to do I'm trying to query a ResultSet (i.e. a table) that contains many rows for each "group": `id \| artist \| album -------------------- 01 \| AA \| aa 02 \| AA \| ab 03 \| AA \| ac 04 \| BB \| ba 05 \| BB \| bb 06 \| BB \| bc 07 \| CC \| ca 08 \| DD \| da 09 \| DD \| db --------------------` [download] I want to be able to pass a value for "limit_per_artist" and get as a new ResultSet, e.g. with value "2": `id \| artist \| album -------------------- 01 \| AA \| aa 02 \| AA \| ab 04 \| BB \| ba 05 \| BB \| bb 07 \| CC \| ca 08 \| DD \| da 09 \| DD \| db --------------------` [download] This is needed because in the real DB there are up to hundreds of rows per "artist" and therefore the vast majority of rows are not wanted. Why not use SQL? The application (CPAN Testers API) is already well established as a set of DBIC classes, and the preference is to stick to that if possible. I showed a MySQL solution in my OP, and Shadowsong showed another SQL technique in his/her reply. why not create a view, and just query that instead? That seems to be the most promising path, although as I explained, this query is to take place on a RS after a number of other queries. So I think I would have to create a view for each scenario: i.e. one view for the case where the user wants the RS to be limited to a certain Artist, another when the user wants the RS limited to a certain Genre, etc. (If it were me I would implement all that directly in SQL::Abstract, but as I said this project is built in DBIC.) another way to do it, is with UNION ALL and subqueries I have seen this technique, but I don't believe it scales to a large number of "groups", and also, it's unknown what the "groups" are and I am trying to avoid doing multiple passes over the table. This seems like it would be a good way to go, constructing the UNION query in Perl, if it was acceptable to first get a list of "groups" with `select distinct name from artists` or similar. I'm coming to think that this situation is an excellent example of the Corion Doctrine ("ORMs are great until you need to do something complicated with them, then they are just an obstacle "), and that I'll have to implement something "clunky" like multiple views, or a multiple-pass solution. It doesn't seem that DBIC is going to be able handle it (and there has been resounding silence on the IRC channel where I've asked for help twice so far). I'd be glad to find I've missed something or misunderstood something, please let me know if so. Thanks! The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^3: Get top N rows in each group with DBIx::Class by chacham (Prior) on Nov 27, 2017 at 13:20 UTC
I think I would have to create a view for each scenario: i.e. one view for the case where the user wants the RS to be limited to a certain Artist, another when the user wants the RS limited to a certain Genre, etc. Being a view is not generated, there is no problem in writing a complex view that would have some insane query plan if run without a where clause, and then querying for each small case needed, resulting in a very quick runtime. If the groups are small (artist, genre, etc), just create one view to order them all complete with row number, a column to identify which group it is, and perhaps what ordering is being used. Then, querying it for a specific group, ordering, or limit is simple. `select Lucutus from Borg where grouping in ('artist', 'genre') and ordering = 'popularity' and row_number <= 5;`	[reply] [d/l]
Re^4: Get top N rows in each group with DBIx::Class by 1nickt (Canon) on Nov 27, 2017 at 13:26 UTC
Re^5: Get top N rows in each group with DBIx::Class by chacham (Prior) on Nov 27, 2017 at 13:31 UTC


We don't bite newbies here... much
	PerlMonks