Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Get top N rows in each group with DBIx::Class

by chacham (Prior)
on Nov 27, 2017 at 12:40 UTC ( #1204318=note: print w/replies, xml ) Need Help??


in reply to Get top N rows in each group with DBIx::Class

I'm not understanding what you are trying to do. Why not use SQL? If you do not want to use a fancy SQL statement in your perl, why not create a view, and just query that instead. BTW, another way to do it, is with UNION ALL and subqueries. That is (simplified):

select group, (select member where order = 1) union all select group, (select member where order = 2) union all ...

For only 5 of each, this is a pretty simple way that should work in any database. It can be done without UNION ALL by using sub-queries for your tables, and so on.

Replies are listed 'Best First'.
Re^2: Get top N rows in each group with DBIx::Class
by 1nickt (Abbot) on Nov 27, 2017 at 13:07 UTC

    Hi chacham, thank you for your reply.

    I'm not understanding what you are trying to do

    I'm trying to query a ResultSet (i.e. a table) that contains many rows for each "group":

    id | artist | album -------------------- 01 | AA | aa 02 | AA | ab 03 | AA | ac 04 | BB | ba 05 | BB | bb 06 | BB | bc 07 | CC | ca 08 | DD | da 09 | DD | db --------------------
    I want to be able to pass a value for "limit_per_artist" and get as a new ResultSet, e.g. with value "2":
    id | artist | album -------------------- 01 | AA | aa 02 | AA | ab 04 | BB | ba 05 | BB | bb 07 | CC | ca 08 | DD | da 09 | DD | db --------------------

    This is needed because in the real DB there are up to hundreds of rows per "artist" and therefore the vast majority of rows are not wanted.

    Why not use SQL?

    The application (CPAN Testers API) is already well established as a set of DBIC classes, and the preference is to stick to that if possible. I showed a MySQL solution in my OP, and Shadowsong showed another SQL technique in his/her reply.

    why not create a view, and just query that instead?

    That seems to be the most promising path, although as I explained, this query is to take place on a RS after a number of other queries. So I think I would have to create a view for each scenario: i.e. one view for the case where the user wants the RS to be limited to a certain Artist, another when the user wants the RS limited to a certain Genre, etc. (If it were me I would implement all that directly in SQL::Abstract, but as I said this project is built in DBIC.)

    another way to do it, is with UNION ALL and subqueries

    I have seen this technique, but I don't believe it scales to a large number of "groups", and also, it's unknown what the "groups" are and I am trying to avoid doing multiple passes over the table. This seems like it would be a good way to go, constructing the UNION query in Perl, if it was acceptable to first get a list of "groups" with select distinct name from artists or similar.

    I'm coming to think that this situation is an excellent example of the Corion Doctrine ("ORMs are great until you need to do something complicated with them, then they are just an obstacle "), and that I'll have to implement something "clunky" like multiple views, or a multiple-pass solution. It doesn't seem that DBIC is going to be able handle it (and there has been resounding silence on the IRC channel where I've asked for help twice so far).

    I'd be glad to find I've missed something or misunderstood something, please let me know if so. Thanks!


    The way forward always starts with a minimal test.

      I think I would have to create a view for each scenario: i.e. one view for the case where the user wants the RS to be limited to a certain Artist, another when the user wants the RS limited to a certain Genre, etc.

      Being a view is not generated, there is no problem in writing a complex view that would have some insane query plan if run without a where clause, and then querying for each small case needed, resulting in a very quick runtime.

      If the groups are small (artist, genre, etc), just create one view to order them all complete with row number, a column to identify which group it is, and perhaps what ordering is being used. Then, querying it for a specific group, ordering, or limit is simple.

      select Lucutus from Borg where grouping in ('artist', 'genre') and ordering = 'popularity' and row_number <= 5;

        Hm, that sounds like a good idea, to assign intra-group row numbers to all records first, then constrain in all other ways, then constrain to N per group, knowing that the row numbers may not be sequential but will be ordered. Thanks, I'll try it!


        The way forward always starts with a minimal test.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1204318]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2020-12-01 09:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?