Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Find similar records based on multiple column with multiple criteria

by erix (Vicar)
on Dec 13, 2013 at 21:27 UTC ( #1067072=note: print w/ replies, xml ) Need Help??


in reply to Find similar records based on multiple column with multiple criteria

It's not clear what you want done: you do not define similarity.

FWIW, the provided data can be grouped using only column k, by a simple floor( k / 100.0 ) * 100 expression. How can more be deduced from your data?

select mm, m , p , k , y , my, r , s , o , c, whatness from t order by floor( k / 100.0 ) * 100 , k ; mm | m | p | k | y | my | r | s | o | c | what +ness ----+-----+-------+--------+----+----+--------+----+--------+---+----- +------ 84 | 250 | 16700 | 4900 | 13 | 0 | 102124 | 23 | 0 | 0 | * si +milar 84 | 250 | 16700 | 4900 | 13 | 6 | 102124 | 3 | 0 | 5 | * si +milar 84 | 250 | 17290 | 4905 | 13 | 6 | 102124 | 1 | 3687 | 0 | * si +milar 84 | 250 | 17290 | 4905 | 13 | 6 | 102124 | 22 | 3687 | 2 | * si +milar 84 | 250 | 17290 | 4905 | 13 | 6 | 102124 | 4 | 3687 | 2 | * si +milar 84 | 250 | 17290 | 4910 | 13 | 6 | 102124 | 3 | 3687 | 2 | * si +milar 84 | 250 | 10900 | 46423 | 11 | 5 | 52012 | 3 | 485 | 1 | # si +milar 84 | 250 | 10200 | 46423 | 11 | 5 | 52012 | 23 | 485 | 1 | # si +milar 84 | 250 | 10900 | 46423 | 11 | 5 | 52012 | 8 | 485 | 0 | # si +milar 84 | 250 | 9900 | 46423 | 11 | 5 | 52012 | 22 | 485 | 1 | # si +milar 84 | 250 | 14380 | 49501 | 11 | 5 | 0 | 1 | 140427 | 0 | si +milar 84 | 250 | 13980 | 49501 | 11 | 5 | 31751 | 6 | 140427 | 0 | si +milar 84 | 250 | 13980 | 49501 | 11 | 5 | 31751 | 3 | 140427 | 1 | si +milar 84 | 250 | 14380 | 49501 | 11 | 5 | 0 | 23 | 140427 | 1 | si +milar 84 | 250 | 14390 | 49501 | 11 | 5 | 0 | 22 | 140427 | 1 | si +milar 84 | 250 | 5490 | 150000 | 7 | 0 | 0 | 23 | 54964 | 0 | & si +milar 84 | 250 | 5300 | 150000 | 7 | 11 | 31609 | 6 | 54964 | 0 | & si +milar 84 | 250 | 5200 | 150000 | 7 | 11 | 31609 | 8 | 54964 | 3 | & si +milar (18 rows)


Comment on Re: Find similar records based on multiple column with multiple criteria
Select or Download Code
Re^2: Find similar records based on multiple column with multiple criteria
by ssc37 (Acolyte) on Dec 15, 2013 at 02:07 UTC
    This is not as simple as querying Mysql ...
    Records that are similar are determined by many factors.
    So I need to pass the information to validate or not the fact that several lines should be considered similar.
    I do not want to overload the server with thousands of queries, or create tables with the memory storage engine.
    When I thought, in fact, what I need is to be able to recover the data I have to deal with the MySQL server, keep them in memory on the server that executed the script, and access such a simple way as Mysql. A kind of NoSQL inside perl actually, to manipulate my data on the fly, create temporary tables etc. ..
    The only thing I found is that looks like this:
    http://search.cpan.org/ ~ vladb/DBIx-DataLookup-0.03/DataLookup/DataLookup.pm but this no longer seems maintained
    Any suggestions?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1067072]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (11)
As of 2014-12-28 23:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (183 votes), past polls