Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Find similar records based on multiple column with multiple criteria

by erix (Vicar)
on Dec 13, 2013 at 21:27 UTC ( #1067072=note: print w/ replies, xml ) Need Help??


in reply to Find similar records based on multiple column with multiple criteria

It's not clear what you want done: you do not define similarity.

FWIW, the provided data can be grouped using only column k, by a simple floor( k / 100.0 ) * 100 expression. How can more be deduced from your data?

select mm, m , p , k , y , my, r , s , o , c, whatness from t order by floor( k / 100.0 ) * 100 , k ; mm | m | p | k | y | my | r | s | o | c | what +ness ----+-----+-------+--------+----+----+--------+----+--------+---+----- +------ 84 | 250 | 16700 | 4900 | 13 | 0 | 102124 | 23 | 0 | 0 | * si +milar 84 | 250 | 16700 | 4900 | 13 | 6 | 102124 | 3 | 0 | 5 | * si +milar 84 | 250 | 17290 | 4905 | 13 | 6 | 102124 | 1 | 3687 | 0 | * si +milar 84 | 250 | 17290 | 4905 | 13 | 6 | 102124 | 22 | 3687 | 2 | * si +milar 84 | 250 | 17290 | 4905 | 13 | 6 | 102124 | 4 | 3687 | 2 | * si +milar 84 | 250 | 17290 | 4910 | 13 | 6 | 102124 | 3 | 3687 | 2 | * si +milar 84 | 250 | 10900 | 46423 | 11 | 5 | 52012 | 3 | 485 | 1 | # si +milar 84 | 250 | 10200 | 46423 | 11 | 5 | 52012 | 23 | 485 | 1 | # si +milar 84 | 250 | 10900 | 46423 | 11 | 5 | 52012 | 8 | 485 | 0 | # si +milar 84 | 250 | 9900 | 46423 | 11 | 5 | 52012 | 22 | 485 | 1 | # si +milar 84 | 250 | 14380 | 49501 | 11 | 5 | 0 | 1 | 140427 | 0 | si +milar 84 | 250 | 13980 | 49501 | 11 | 5 | 31751 | 6 | 140427 | 0 | si +milar 84 | 250 | 13980 | 49501 | 11 | 5 | 31751 | 3 | 140427 | 1 | si +milar 84 | 250 | 14380 | 49501 | 11 | 5 | 0 | 23 | 140427 | 1 | si +milar 84 | 250 | 14390 | 49501 | 11 | 5 | 0 | 22 | 140427 | 1 | si +milar 84 | 250 | 5490 | 150000 | 7 | 0 | 0 | 23 | 54964 | 0 | & si +milar 84 | 250 | 5300 | 150000 | 7 | 11 | 31609 | 6 | 54964 | 0 | & si +milar 84 | 250 | 5200 | 150000 | 7 | 11 | 31609 | 8 | 54964 | 3 | & si +milar (18 rows)


Comment on Re: Find similar records based on multiple column with multiple criteria
Select or Download Code
Re^2: Find similar records based on multiple column with multiple criteria
by ssc37 (Acolyte) on Dec 15, 2013 at 02:07 UTC
    This is not as simple as querying Mysql ...
    Records that are similar are determined by many factors.
    So I need to pass the information to validate or not the fact that several lines should be considered similar.
    I do not want to overload the server with thousands of queries, or create tables with the memory storage engine.
    When I thought, in fact, what I need is to be able to recover the data I have to deal with the MySQL server, keep them in memory on the server that executed the script, and access such a simple way as Mysql. A kind of NoSQL inside perl actually, to manipulate my data on the fly, create temporary tables etc. ..
    The only thing I found is that looks like this:
    http://search.cpan.org/ ~ vladb/DBIx-DataLookup-0.03/DataLookup/DataLookup.pm but this no longer seems maintained
    Any suggestions?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1067072]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2015-07-04 22:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls