Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

SQL queries from weird data

by Anonymous Monk
on Feb 19, 2013 at 15:13 UTC ( #1019585=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello all

I'm sorry if this post is long, I'm trying to provide enough data to examplin my problem. I have some database tables which contain some columns, for example:

/----------------------\ | TYPE | STYPE | OTHER | \----------------------/

The data stored within these columns is something I have to use to make a SQL query. The values stored in these tables can be simple, like

/----------------------\ | TYPE | STYPE | OTHER | |----------------------| | AAA | BBB | X=CCC | \----------------------/

The desired output for this would be
select * from targettablename where type='AAA' and stype='BBB' and x=' +CCC';

Col OTHER inclues a key value pair. So far so good? Now the syntax for things like wildcard searches and or in the existing tables is like
/-----------------------------\ | TYPE | STYPE | OTHER | |-----------------------------| | AA|BB | BB% | X=CC%&Y=DDD | |-----------------------------| | AA|BB | | X=CC%&Y=DDD | |-----------------------------| | AA | BB% | X=CC%&Y=DD% | \-----------------------------/

The SQL query these should generate are:
select * from targettablename where type in ('AA','BB') and stype like + 'BB%' and X like 'CC%' and y='DDD'; select * from targettablename where type in ('AA','BB') and X like 'CC +%' and y='DDD'; select * from targettablename where type = 'AA' and stype like 'BB%' a +nd X like 'CC%' and y like 'DD%';

So as you can see this could end up pretty complex, each column value may have multiple likes or 'ors', and the others col contains key value pairs which may also have various types of searching. I'm investigating a generic solution which can generate proper SQL queries for each entry in this table. At the moment I'm a little lost as to how to actually do this. I know that I need to split any entries in the 'other' column and process them. I'm trying to visualize a generic function to which I can pass the contents of any field (and the split up key value pairs in other). Any advice would be great.

Replies are listed 'Best First'.
Re: SQL queries from weird data
by choroba (Bishop) on Feb 19, 2013 at 16:02 UTC
    For the examples you gave, the following works. However, if the actual values are even more complex, you might need to write a real parser for the expressions. For that, I would recommend Parse::RecDescent or Marpa::R2.
    #!/usr/bin/perl use warnings; use strict; use Test::More; sub simple_value { my ($name, $val) = @_; my $return; if (not length $val) { $return = q(); } elsif (1 + index $val, '|') { my @vals = split /\|/, $val; $return = "$name in(" . join(',' => map "'$_'", @vals) . ")"; } elsif (1 + index $val, '%') { $return = "$name like '$val'"; } else { return "$name='$val'"; } } sub complex_value { my $value = shift; my @simple = split /&/, $value; return glue(map { my ($name, $val) = split /=/; simple_value(lc $name, $val); } @simple); } sub glue { join ' and ', grep length, @_; } while(not eof DATA) { chomp(my ($type, $stype, $other, $result) = (map scalar <DATA>, 1 +.. 5)); $type = simple_value('type', $type); $stype = simple_value('stype', $stype); $other = complex_value($other); is(glue($type, $stype, $other), $result, $result); } done_testing(); __DATA__ AAA BBB X=CCC type='AAA' and stype='BBB' and x='CCC' AA|BB BB% X=CC%&Y=DDD type in('AA','BB') and stype like 'BB%' and x like 'CC%' and y='DDD' AA|BB X=CC%&Y=DDD type in('AA','BB') and x like 'CC%' and y='DDD' AA BB% X=CC%&Y=DD% type='AA' and stype like 'BB%' and x like 'CC%' and y like 'DD%'
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Thank you for this reply. I will run this with some production data and report back. Thanks again. Have you any experience of the SQL::Abstract module? I'm looking at it right now and it claims to "Generate SQL from Perl data structures". Thank you once more.
Re: SQL queries from weird data
by RichardK (Parson) on Feb 19, 2013 at 15:43 UTC

    To me, the problem looks more like an issue with your database schema design.

    If you normalised out the "other" column into its own key/value table then your problem would go away and your code would be nice and simple :)

      Indeed. And you wouldn't be trying to execute data, which is something to be avoided.
      I should have been more clear, this isn't our schema, these aren't our tables. This is generated by a third party product (VB.Net front end, Oracle back end) over which we have zero control. We're trying to provide more functionality via some scripting. We want to do things this product doesn't do.
Re: SQL queries from weird data
by sundialsvc4 (Abbot) on Feb 19, 2013 at 19:44 UTC

    If the amount of data is at-all reasonable, you could write a script which populates additional SQL tables whose purpose is to index the data that you receive from this most-peculiar application.   Create a suitable primary-key column even if you have to do it by hand, then scan through the data inserting rows in other tables that consist of break-downs of what is in the original data.   For example, two entries in an index-table for row #1 would specify that it contains 'AA' and 'BB'.   Once you have done this, now you can begin to do sensible queries against the data that you get.

    A script to parse the original data stem-to-stern, inserting descriptive records in other tables, would be fairly easy to write and could accomplish its work with a single pass through the data.   Now, you no longer have to depend upon its weirdnesses.   I truly believe that you will never get a truly-satisfactory program from the approach that you are pursuing right now.   I daresay that the incoming data is full of special-cases and exceptions that would vex you endlessly.

Re: SQL queries from weird data
by Anonymous Monk on Feb 19, 2013 at 15:36 UTC
    Welcome to SQL monks!
      Not a SQL question, see for more detail.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1019585]
Front-paged by Corion
[Discipulus]: good morning community!
[Corion]: Good morning Discipulus!

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2018-06-19 07:12 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (111 votes). Check out past polls.