Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

The fine art of database programming

by gmax (Abbot)
on Apr 29, 2002 at 05:17 UTC ( [id://162771]=perlmeditation: print w/replies, xml ) Need Help??

Database programming blues


Dear fellow monks,
Every now and then, somebody asks for advice about a problem involving database programming. Sometimes, it is a clean cut question, with the right balance between programming language (Perl) and database issues and a clearly defined example. In these cases, the question gets answered, and everybody is back home happy.

Sometimes, however, the poster doesn't have the foggiest idea of what is wrong, and submits an example with a mix of basic language, DBI, CGI and SQL problems. Then, many people answer these questions, giving specific advice on the part they found wrong, and the basic problem stays unsolved. Not only, when the right answer comes (the one that identifies the inner problem and gives the soundest piece of advice) it may happen that it is lost into a sea of comments that advise on how to correct some macroscopic but unrelated stylistic blunders.

In such cases, I realize that the basic problem afflicting the poster - and some of the willing helpers - is a misunderstanding of database principles.
This article is for both of them, for the ones who want to ask for advice and the ones who are offering their help. An outline of database programming, with particular emphasis on Perl programming, is necessary for both asking the right question and recognizing the right answer.
This article is neither a database nor a Perl tutorial. It's just a categorization of the problems we may face when developing database applications. Although some thoughts can be generalized to any high level language, most of this meditation is specific to Perl.

Before you go on


I assume that you know at least what the DBI is. If you don't, please have a look at the tutorials on database programming, whose contents I don't want to duplicate here.

Programming principles


The soundest approach to programming I have ever come across was the one that explained computer programs as being composed of three parts:
  1. input
  2. processing
  3. output
No matter how complex your program is, it could be always reduced to these three components. Sometimes it's hard to identify one of them (in embedded systems, for example) but they are always there.

The best organization we could have in our programs is to keep those three elements separated as much as we can. This way, if our input device (or protocol, interface, or whatever) changes, we have to modify only one element of the program. Moreover, if something is wrong, we can more easily identify which piece is faulty.
We can follow this approach no matter which technology we are using. In either structural or Object Oriented programming, there are resources and idioms to isolate those elements.

When dealing with databases, each of these parts could be replaced by a database interaction. Sometimes even all three. Here's an example. The first input is from a database table with the official holidays for the organization. If today's holiday, then the second input is the list of employees working today, for whom we'll calculate the appropriate wages, depending of their job and seniority (which we also get from the database). The program's output is a database update of their payroll records.
The above example is not a standalone program, but it's part of a more complex application to deal with human resources management. Depending on the business rules that small program could be just a periodic job with no human intervention, or it could be part of a stricter work-flow, where each input and output requires a supervisor's approval. Either way, having the three component separated will help to integrate this program into the rest of the application.

Source code intricacy


One additional burden is due to different source codes that you might need to use at once for your program. A typical case is a Perl script getting information from a database using SQL queries, and sending output to a web server, thus generating markup language such as HTML or XML.
There are several ways to avoid cluttering your script with many types of code at once. Query generators can hide the SQL code. Templates and wrappers can generate markup language for you. In these cases, you trade code simplicity for debugging complexity.
This problem is especially felt when you are creating SQL statements through some wrappers that replace the SQL code with some other "simpler" instructions. The more you trust a tool (especially if you don't fully understand what it is doing), the more you are exposed to side effects that you might be unable to investigate properly when hell breaks loose.
About wrappers, there are two articles in www.perl.com, the first advocating the the virtues of a wrapper to simplify your coding even at the price of a huge efficiency hit; the other, brilliantly expressed by chromatic, shows how the DBI, assisted by some useful Perl idiom, can deal with any database problem.

In this subject, I am biased toward using the most powerful tool (the DBI), even at the price of some additional typing. It is true that, when you are a beginner, you feel intimidated by the DBI, and you'd like some simplifying solution that will save you some typing and will hide those hideous database details. However, you don't want to be a beginner forever. Or do you?

Some reason for complexity in database programming

The different languages in your script, which we have seen above, are only one symptom of the complexity involving database programming. The real complexity lies in some often overlooked properties of databases.

Client/server


Modern databases work with some sort of client/server protocol. The Perl DBI accepts this behaviour as a common ground for all databases and builds all its structure around such concept.
It is important to grasp the general idea of client/server - and even better would be to understand how your specific database is implementing it.
A database server is a program waiting for a request, which could arrive through the network, via an internal socket, via a file or whatever. The basic principle is its waiting. Then the client sends a request.
The DBI supports the idea of a request being prepared, i.e. analyzed for correctness and possibly optimized, then reused several times with either the same or different parameters. Anyway, if your database doesn't support the preparation, it is conveniently simulated by the DBI.
After a request has been successfully prepared, the client can ask for its execution. Notice that nothing is assumed by the database server about your intentions. Whatever you want, it's your call. It means that the server doesn't go on unless you ask.
So you have asked the server to execute the request. What does that mean? It means that the server will execute your query, and save the results into a temporary buffer of memory, waiting for more requests.
Yes. This is the part that's difficult to come to terms with if you are used to desktop database applications, where your request is the last word, meaning that you have triggered an irrevocable chain of events that will result into a list of records on your screen. A client/server database will wait for your request before doing anything, and after each step it will wait for your next request.
So, you have asked for the query to be executed. The server complies, and now it's waiting for your next wish. For how long it's going to wait? It depends on the database, but it's a safe assumption that its patience will not exceed the life of your connection thread. When you disconnect, or your thread is timed out due to your inactivity, all the temporary space allocated to your request is freed.
It's time for you to get the results from the server. You may ask to have all your results at once or one by one, depending on your estimate of their size and on the availability of such services in your database. The methods $sth->fetchrow_array, $sth->fetchrow_arrayref and $sth->fetchrow_hashref will get you the next available record from your temporary buffer in the server. When the last record has being claimed (or when you issue a $sth->finish method) the server considers your request satisfied and frees the memory occupied by your records. It will still retain information about your query preparation, until you disconnect.

You should bear in mind these steps when planning a database program, especially if you know that many clients could request the same service at the same time. Then, if you have a performance problem, you might guess where the problem lies.

I'll give you one example where a misunderstanding of the client/server principles creates a waste of memory and thus a reduction in general performance.
my $dbh = DBI->connect($DNS, {RaiseError=>1}); my $sth = $dbh->prepare(qq{SELECT product, price, quantity FROM products WHERE quantity < ? }); $sth->execute(100); sub ask_customer { # display product details # ask customer input # do something smart with customer answer } while (my ($product, $price, $quantity) = $sth->fetchrow_array()) { ask_customer ($product, $price, $quantity); } $dbh->disconnect();
This script will keep the results locked in the database buffer until the customer has entered some input about all the records. If many clients use the same approach, it will result in the database draining out its memory very soon.
The right approach would be to save all records into an array and then asking for the customer's input. Even better, this situation calls for using one of the fetchall_* methods, asking the server to send you all the results at once.

Data structure


If I had to name a single item where database programming is creating the most confusion for the beginner, I would say that the data structure is the first candidate.
When dealing with a database, your program has an internal data representation that usually doesn't reflect the external data organization. Most databases store data according to the relational model, where information is organized by tables, without redundancy. The links between data in different tables are called relations and are implemented by using key values.
A typical - albeit simplified - database representation of a sales system includes information about the customers and the products. Each customer has a unique field, also known as "primary key", which identifies the record. The same is true for the products.
Thus the order records, instead of repeating all the values from customers and products, store just a reference to them, their identifier. When used in such context, a field referring to a primary key in another table, the identifiers are known as "foreign keys".
+----------------------+ +-----------------------+ | customer | | order | +-----+--------+-------+ +--------+--------+-----+ | ID | name | state | | custID | prodID | qty | +-----+--------+-------+ +--------+--------+-----+ | C01 | Joe | NY | | C02 | P03 | 9 | | C02 | Frank | NY | | C02 | P01 | 2 | | C03 | Bill | TX | | C02 | P02 | 200 | | C04 | Moe | MA | | C05 | P01 | 3 | | C05 | Sue | CA | | C05 | P02 | 450 | +-----+--------+-------+ | C01 | P04 | 5 | | C01 | P03 | 1 | +----------------------+ +--------+--------+-----+ | product | +-----+-------+--------+ | ID | price | name | +-----+-------+--------+ | P01 | 1.90 | hammer | | P02 | 0.05 | nail | | P03 | 5.50 | pliers | | P04 | 4.00 | cutter | +-----+-------+--------+
This is a rather simplified view, which you should integrate with some more complete explanation of the relational model. (See some references at this address.) Here I just want to point to the shock effect of having this data structure in your program.
If I want to get the above data into my program, I have to decide how to represent the records. Should I blindly copy the same data structure?
my @customers = ( [ 'C01', 'Joe', 'NY' ], [ 'C02', 'Frank', 'NY' ], [ 'C03', 'Bill', 'TX' ], [ 'C04', 'Moe', 'MA' ], [ 'C05', 'Sue', 'CA' ] ); my @products = ( [ 'P01', 'hammer', 1.90 ], [ 'P02', 'nail', 0.05 ], [ 'P03', 'pliers', 5.50 ], [ 'P04', 'cutter', 4.00 ] ); my @orders = ( [ 'C02', 'P03', 9 ], [ 'C02', 'P01', 2 ], [ 'C02', 'P02', 200 ], [ 'C05', 'P01', 3 ], [ 'C05', 'P02', 450 ], [ 'C01', 'P04', 4 ], [ 'C01', 'P03', 1 ] );
These arrays are a mirror image of what I can get from the database. Why it is not a good idea to do such a thing? For at least two reasons:
  1. Because these records are in the right format for storage, but they are not suitable for being shown to a human being. When I look at a sales report, I want to see that Frank bought two hammers, not that 'C02' bought 2 'P01'. Therefore, I should get such data re-assembled to be human-readable. Every database engine can do that easily, while my program would make quite a lousy job at it.
  2. Also because the number of raw records involved could be really high, and could needlessly occupy all the available memory of my client application, while the database could be asked to return just those pieces of information that I want.
The database engine can return results in a more friendly format, without any additional programming effort from my side. Just ask in a polite SQL statement and the database will combine the tables to return a list of orders or the total amount by each customer.
SELECT cust.name AS customer, prod.name AS product, price, qty, qty*price AS total FROM order INNER JOIN customer cust ON (cust.ID = custID) INNER JOIN product prod ON (prod.ID = prodID) +------------------------------------------+ | query results | +----------+---------+-------+-----+-------+ | customer | product | price | qty | total | +----------+---------+-------+-----+-------+ | Frank | pliers | 5.50 | 9 | 49.50 | | Frank | hammer | 1.90 | 2 | 3.80 | | Frank | nail | 0.05 | 200 | 10.00 | | Sue | hammer | 1.90 | 3 | 5.70 | | Sue | nail | 0.05 | 450 | 22.50 | | Joe | cutter | 4.00 | 5 | 20.00 | | Joe | pliers | 5.50 | 1 | 5.50 | +----------+---------+-------+-----+-------+ SELECT cust.name AS customer, SUM(qty*price) AS total FROM order INNER JOIN customer cust ON (cust.ID = custID) INNER JOIN product prod ON (prod.ID = prodID) GROUP BY customer +------------------+ | query results | +----------+-------+ | customer | total | +----------+-------+ | Frank | 63.30 | | Sue | 28.20 | | Joe | 25.50 | +----------+-------+
The above data could be represented in your program by a hash of lists:
my %orders_by_customer = ( 'Frank' => [ { product => 'pliers', qty => 9 }, { product => 'hammer', qty => 2 }, { product => 'nail', qty => 200 } ] , 'Sue' => [ { product => 'hammer', qty => 3 }, { product => 'nail', qty => 450 } ] , 'Joe' => [ { product => 'cutter', qty => 5 }, { product => 'pliers', qty => 1 } ] );
Transforming the data from the database into this common Perl structure is fairly simple, thanks to a well known idiom:
my %orders_by_customer = (); while (my $href = $sth->fetchrow_hashref()) { my %order = ( 'product' => $href->{'product'}, 'qty' => $href->{qty} ); push @{$orders_by_customer{$href->{'customer'}}}, \%order; }
Another cause for distress is for the programmer who understands some of the problems coming from the relational model, but fails to understand that each result from the database is just one table, which we can see as a list of lists or a list of hashes, but it is a single entity. Whether it is a query from one table or the joined result of multiple tables, the result from the database is always one bi-dimensional array.
Who is used to desktop database applications, maybe the ones with a colorful GUI, has been exposed to the idea that tables can be "opened" and "edited", and then each record seen on the screen is the actual value in the database. Not so with client/server databases. Even if you are using an application that says "open table", what you get on your screen is the result of a query, just a copy of the actual data, and to modify it you have to issue an SQL command.
The beginner may suffer from the apparent dichotomy of having a permanent external data structure that often does not match with the one used for internal purposes. However, bearing in mind that the external data structure is not our business, since we can ask the database engine to return the most suitable result, we can concentrate on the internal part, considering the database interaction just a method to fill in our variables.

Number


Databases can hold huge amounts of records. It should not come as a surprise to us. However, we sometimes forget that also database queries can give back a large number of records.
This fact could be neglected when we build an application and we use a test database that is likely to be empty or to hold just a handful of records. What could seem to work flawlessly in a database with just 100 records and three users might become disastrously inefficient when loaded with ten million records and thirty thousand users.
Database engines can filter your requests and give you an answer surprisingly quickly, provided that you asked the right thing.
Unless you have a very good reason, asking for one million records at once is unlikely to be "the right thing".

More database trouble


SQL dialects


SQL is an abstract concept. You can't learn practical SQL without associating it with a specific database. And then you realize that what you are using is a database specific dialect, which looks like the mythical ANSI SQL, but it is very rarely implemented as such. It is something like learning English. There is no "standard English" anywhere, only dominant dialects, spread throughout the world by the force of Hollywood, the BBC and the CNN. Depending on which stream you were exposed, you will say "pavement" or "sidewalk" and you'll wear a "vest" under or over your shirt.
Similarly, if you have learned the Oracle syntax for a left join, you might look suspiciously at the MySQL equivalent.
The standard ANSI SQL provides many common chunks of the language, making you cope with most of the day-by-day cases. Knowing which parts of SQL implemented in your database of choice will comply with the standard is not only academic knowledge. It is a good investment towards portability and simplicity. If you use only standard constructs, you are more likely to find an answer when you get into trouble. If you are using only database-specific idioms, then you either are a wizard or you have to find one when things go wrong.

Wrapper modules


The DBI is the official interface between Perl and databases. It comes on the assumption that you know what you want, and you also know the underlying database, its design and how to deal with it. It may seem harsh, but such is life. Database programming does not have shortcuts. You may find some module that will help you to save some keystrokes. Some modules will handle many aspects of the database complexity for you. Some will ever promise to make database programming easier to understand, or to free you from the burden of database design. Don't fall into that trap. A DBI wrapper could be useful to speed up your development process or to help you handle the complexity of the database schemas. But none of them will replace your understanding of the subject. If you use a wrapper that promise you to deal with INNER, LEFT and RIGH JOINs, which you can't fully understand, then you are trusting an unknown piece of code to do something of which you have a limited command.
There are way too many database modules in the CPAN. Some of them are extremely valuable, provided that you understand what they are doing. Some are just a smart engine to save you some typing at the price of crippling the performance of your application and your flexibility in choosing the the most appropriate solution.
Like many other technological gadgets, DBI wrappers could be of help for the beginner, but they will keep the beginner in the lower end of the professional growth. Many people say that the DBI has a steep learning curve. That is not true. If you are not familiar with any relational client/server database, then the DBI will be as difficult as learning your first foreign language. However, if you know how a relational database works, especially if you know how a client/server engine reacts to your requests, then you can learn the DBI by evaluating similarities and differences with the general concepts that you already have in your bag of tricks.
The DBI is a tool. You can use it well if you know your trade. If you lack general database understanding, using the DBI will be like driving in the dark with your headlights off, and no cute module can protect you from an incoming disaster.

Note: Don't get me wrong. I am not against DBI wrappers. Actually, I appreciate some of them very much, like DBIx::DWIW. I just prefer to be in control. Others may feel differently. I don't impose my view on anybody and I keep an open mind about which wrappers could be useful for my needs.

Who's the client?


I said that client/server database applications have their specific annoyances, compared to stand-alone or desktop databases. One more source of confusion arises when we are dealing with a database server and a web server. Who is the client here?
Although the ultimate client of the web server is the user sitting in front of a browser, the client for the database is always a web server process. This is true even when the web client had to provide her username and password for database access. The web server gets the access information, does the talking with the database and returns an answer to the client.
It's important to remember this fact whenever there is a connection problem with the database. Unless username or password are wrong, the problem doesn't lie with the ultimate client (the web client), but with the web server. Testing the database connection with the appropriate user information outside the web server will usually tell you if you were providing your script the right parameters.

Final advice


It's said that the only good advice you can get comes from your own experience, especially your own mistakes. While it is true that I cannot magically transfer to you my database programming experience, I can at least give you my summary of what I feel is its positive outcome. You may decide if it is useful and eventually give it a try.

Enhance your skills

  • Know your database. First and foremost, you need to know what is behind the scenes. The database you are interacting with should not have any mysteries for you. At the very least, you must be sure that what you want to achieve through your script is exactly what you would do with a direct interaction with the database.
  • Know Perl idioms. Most of your dialogs with databases are achieved through intermediate data structures that you use to send data to the database and get results from it. You should feel confident in the usage of any idioms involved in your tasks. Avoid the cargo cult temptation of using something you have seen but you don't understand. Read the documentation about that particular idiom you need, and test it with a script that uses only that idiom and nothing else. Search the examples in The Monastery and read the explanations.
  • Know the intermediate tools (e.g.: DBI, CGI, HTML::Template, TT, Mason, etc). If these tools are your choice for developing your applications, you should reach a deep knowledge of their interface. You must be confident in using them, and be sure that you are either using the most recommended methods or have a very good reason if you don't.
About all of the above, you may safely ignore features that you are sure you'll never need, but you must not neglect any of the features you need to use. For those ones, you must know everything is available in the docs. The time you spend learning how those features work will result in less time debugging your scripts later.

Separation


If you consider all the elements involving database programming, the best piece of advice I can give you is to test such elements separately. When you face a problem, you should at least be able to attribute the problem to the appropriate layer: the script body, the web server interface, the DBI interaction, or whatever other module you may be using for your purpose.
  • Test your subs in a vacuum. It is a good practice to isolate each function of your scripts or modules, and test them with an isolated script, to be sure that the function is doing exactly what you want. See chromatic's article on testing and The Joy of Test for more on this point. Also a tour of Super Search on this subject would be a good investment. update See also this node (thanks, grinder) for a practical implementation of a testing strategy
  • Test your DB queries outside Perl. Before blaming your Perl scripts, be sure that the SQL query you want to run is correct. Test it from a stand-alone database client and see if you get the expected result. If your query is broken, there is no point in insisting with the script.
  • Test your web server without Perl scripts. As for database exchange, also the services of your web server could not be what you expect. Before trying your complex CGI/DBI application, be sure that your Apache can serve static HTML pages and simple CGI programs such as a trivial shell script. When all of the above works, you can try with a very basic Perl/CGI script. Once you are sure that the basic functionality is in order, it's time to try your application, again -whenever possible - testing the different subs separately.

Database related advice


The soundest advice I can give you is "Design before using". While this is true for all computer programming, it has a special meaning for database, because it implies some knowledge of the relational model.1
  1. Normalizing. It is the most important factor. It has been discussed here before (Migrating a 1NF table to multiple 2NF tables and Database normalization the easier way for instance) and there are places where you can look. I won't go into details here. But let me tell you that using a relational database without normalizing is like towing a truck with a Ferrari.
  2. Primary key. Every table in your database must have a primary key. If you have a performance problem that you can't identify, check for primary keys first. If one or more of your tables lack a primary key, the problem is (at least partially) there.
  3. Indexes on foreign keys. All foreign keys should be indexed. Your JOINs will work much faster if you remember and apply this simple rule. If you find that you have too many foreign keys in one table, then perhaps you should go back to the normalization process.
1 Or the Object oriented model, if your database of choice supports it. Just don't throw everything into a table without careful analysis.

Common pitfalls


In database programming, you can shoot yourself in the foot in several ways. Even taking into consideration all the above wisdom and applying all the advice, your application can collapse by means of some silly overlooking. I will list here the most common ones, because I have seen them in too many SOPW requests.
  • List of hashes are a commonly needed structure, either for your own purposes or to accommodate the requests of other modules. Recent versions of the DBI can return these structures directly (fetchall_hashref, selectall_hashref). However, if for any reason you have to use an older release, or other applications needs force you to use another method, you can use the idiom mentioned above, in the paragraph about "structure".
    The difficult part is to remember that, in order to assign another hash element to the array, the hash must be lexically scoped inside the loop block, while the array must be either global or lexically scoped outside the block. If we don't have a new hash for each iteration, we will create an array where all the records are the same and the values are the ones of the last hash.
  • Quoting. A more insidious problem comes from quoting. The typical pitfall goes like this. You have tested your SQL query with a stand-alone client, and it works.
    my $surname = 'Jones'; my $query= qq{SELECT name, surname FROM employees WHERE surname = $surnam +e }; my $sth = $dbh->prepare($query); $sth->execute();
    When you execute that query through the DBI, you get an error, saying that the 'Jones' column was not found or a statement to that effect. What happened? You forgot that text column should be quoted. The value of your $surname variable was passed to the SQL query as a bareword.
    You may be tempted to solve the problem by changing your query snippet into WHERE surname = '$surname', but it will break when your surname is "O'Reilly".
    The DBI offers two solutions to this problem. One is the $dbh->quote($string) method, which will quote your string according to your database's rules. The second one, which I heartily recommend, is the binding mechanism through placeholders. Your query should become:
    my $query= qq{SELECT name, surname FROM employees WHERE surname = +? }; my $sth = $dbh->prepare($query); $sth->execute('Jones');
    The "?" is the placeholder, and the DBI knows that it should be replaced by the value passed to the execute method. The DBI will automatically take care of quoting the variables passed through placeholders, thus relieving you from an implementation headache.

That's all, folks!


That was all I could do without crossing the boundary between article and book.
I hope I managed to shed some light on a few dark, neglected aspects of database programming.
The bottom line is:
To make a better application, plan in advance, know your tools, and let the database engine do its job.
Any further contribution will be welcome.

update 4-May-2002. Fixed some typos.
 _  _ _  _  
(_|| | |(_|><
 _|   

Replies are listed 'Best First'.
Re: The fine art of database programming
by mpeppler (Vicar) on Apr 30, 2002 at 00:57 UTC
    One additional pitfall for most people using a relational database for the first time: there are no record numbers!

    An RDBMS is based on set theory, and a particular row has no inherent position in its table (although some database systems fake this).

    Hence the very common question: how do I fetch rows 100-120, where the database independant answer is always: use an appropriate WHERE clause.

    Michael

      And there was me thinking it was:
      SELECT * FROM table LIMIT 99, 21;
      No WHERE clause in sight. If you want be be a bit more specific on how things are ordered (rather than insertion order (which I believe is defined as being the standard ordering)), you do
      SELECT * FROM table ORDER BY col LIMIT 99, 21;
      Still no WHERE clause.
        I am sorry, pdcawley, but I have to disagree.
        One of the fundaments of the relational model is that you don't make assumptions on the physical storage of the records. What you are suggesting is quite dangerous, as the following example shows.
        mysql> describe test_order; +-------+----------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+----------+------+-----+---------+-------+ | id | int(11) | | PRI | 0 | | | name | char(10) | YES | | NULL | | +-------+----------+------+-----+---------+-------+ 2 rows in set (0.00 sec) mysql> select * from test_order; #all the records +----+------+ | id | name | +----+------+ | 1 | aaaa | | 2 | bbbb | | 3 | cccc | | 4 | dddd | | 5 | eeee | | 6 | ffff | | 7 | gggg | | 8 | hhhh | | 9 | iiii | | 10 | jjjj | +----+------+ 10 rows in set (0.01 sec) mysql> select * from test_order limit 4,3; #1st request. OK so far +----+------+ | id | name | +----+------+ | 5 | eeee | | 6 | ffff | | 7 | gggg | +----+------+ 3 rows in set (0.00 sec) # now we remove one of the records we were selecting through LIMIT mysql> delete from test_order where id = 5; Query OK, 1 row affected (0.00 sec) # and we insert another record mysql> insert into test_order values (11, "kkkk"); Query OK, 1 row affected (0.00 sec) # Then we insert the deleted record again mysql> insert into test_order values (5, "eeee"); Query OK, 1 row affected (0.00 sec) mysql> select * from test_order limit 4,3; +----+------+ | id | name | +----+------+ | 11 | kkkk | | 6 | ffff | | 7 | gggg | +----+------+ 3 rows in set (0.00 sec)
        The same query, after a couple of delete/insert statements, gives you different results.
        It is true that you can correct the result using the ORDER BY clause, but you are not dealing with "record numbers".
        The LIMIT clause, for those database that support it, guarantees that N records from the result dataset, starting at a given position are returned. There is no assumption about their position in the table.
        If you want to return correct records, use the WHERE clause.
         _  _ _  _  
        (_|| | |(_|><
         _|   
        
        I believe that LIMIT is a DBMS-specific extension. For example...
        in MS SQL Server:
        SELECT TOP 10 col FROM table
        in IBM DB2:
        SELECT col FROM table FETCH FIRST 10 ROWS ONLY
        I think the only platform independent way to do this is:
        SELECT col FROM table A WHERE 10 > (SELECT COUNT(*) FROM table B WHERE A.col < B.col) ORDER BY col DESC
Re: The fine art of database programming
by Matts (Deacon) on May 01, 2002 at 11:22 UTC
    Great post.

    One module I'd like to point out that can come in really useful for cross-database applications is DBIx::AnyDBD. This follows the principle laid out above - knowing your database is extremely important, and so you should have access to it's features.

    But beyond that, it ensures that you can always port your application to a new database in a clean and consistent manner, and in a way that your users won't have to care about (apart from changing the DBI connect string). It does this by simply providing a subclass for each database. So you write your default query stuff in a Default.pm file, and put Oracle specific stuff in Oracle.pm, Sybase specific stuff in Sybase.pm, and PostgreSQL specific stuff in Pg.pm. It then uses simple rules at connect time to give you the right class, and it does this correctly with the Windows abstraction layers ODBC and ADO too.

    Another point in its favour is that DBIx::AnyDBD is going to be incorporated into DBI core in the next major DBI revision.

Re: The fine art of database programming
by pdcawley (Hermit) on Apr 30, 2002 at 12:59 UTC
    My top tip for database programming is don't do it until you absolutely have to. Hopefully by then you'll have a good idea of the 'shape' of your data and you'll end up with a reasonably stable database Schema.

    Despite what gmax says about not being afraid of writing raw SQL and using DBI directly I'd still say that you should try and route all your database access through one or two modules (either tools from CPAN, or home rolled) as a way of limiting your dependencies and avoiding the problem of a single change in the database leading to the Death of a thousand cuts as you trawl your codebase hunting down everything that deals with the affected tables.

    The 'performance hit' you'll see as a result of doing this will be nothing compared to increase in 'cost of change' you see when scatter SQL throughout your code like an incontinent puppy. (Guess who's had to work with code like that recently. It's painful.) And then there's testability. When you access the database only through your 'mediator' modules, it becomes possible to drop in a replacement that only pretends to talk to a 'real' database. This can be a remarkably powerful testing technique...

      don't do it until you absolutely have to
      Especially if you don't feel confident about database programming. ;)

      update. You don't decide to use a database for your own pleasure, but because the application requires it. A recent node by ChOas explained very well that good programming is data analysis in the first place. If your analysis calls for a database backend, then you must implement it.

      you should try and route all your database access through one or two modules (either tools from CPAN, or home rolled) as a way of limiting your dependencies
      Using the DBI doesn't mean that you cant program in a modular way. My point is that if you are skilled enough, you should call the database without intermediate modules.
      You can accomodate your programming logic inside one or more modules. By "programming logic" I mean what your application should do, not some set of rules to make the database interface more user friendly. You can build your logic either using the DBI directly or through some wrappers. I prefer the direct way, but it doesn't mean that using the DBI directly will turn my scripts into spaghetti code.
       _  _ _  _  
      (_|| | |(_|><
       _|   
      
      > don't do it until you absolutely have to

      I disagree with this. When designing a database application the structure and relationship of the data is a fundamental. Making it a development afterthought can lead to some big time structural problems as you get hit with new demands on data tooled for code with completly different goals. This would be a lot like advising someone new to object oriented programming not to worry about the relationship between the objects until after they've got all of functionality of the methods worked out so that they can get a clearer picture of how they need to work together.

      It's usually not hard to tell applications that have been designed in this manner: Flatfilesque table designs. Lot's of duplication of data to meet specific code needs. No abstraction in the data models. Most of the time you can tell the difficulty of refactoring an application simply by looking at its database schema.

      When designing any database application, the first question I ask is what is the information that I need and what is the bast way that I can store it? Once I have those relational models in place things like Classes and their relationships just seem to flow out of it. This approach may be slower in terms of initial development, but you'll avoid a lot of major headaches down the road. It's a lot easier to fix bad code with good data than it is to fix bad data with good code.

      People interested in database design should check out the writings of Ralph Kimball.

      ()-()
       \"/
        `                                                     
      
        I think you might be misunderstanding me. I'm not saying that, when the time comes that you absolutely can't avoid using a database, you should slap it on the side and hope. Codd's rules for normalization and all jazz are just as important if you add a database late in the day as they are if you're going down the 'Big Design Up Front' route.

        This is just the XP "You Aren't Gonna Need It" principle in action. Until the code tells me I need a database I won't worry about it. Once I know one is needed I'll take the time to make sure it's well factored and well normalized because that's the only way to keep my cost of change down.

        The problem I have with RDBMSes is that the Relational model doesn't really map onto objects and back again that well; What you want is some kind of general solution that you can chuck an arbitrary object at and have it automagically stored, giving you back a magic cookie that you can use to retrieve it later. Given careful design it's possible to work things so that the day to day running of your code doesn't require complex queries (because complex queries are *nasty* with the kind of database that'll stash arbitrary objects).

        So, the approach I'm taking with the code I'm working at the moment is to have two different databases. One object database (not yet written, we don't actually need it yet) that doesn't mess around trying to do cunning queries or relational magic, which will probably use a combination of BerkeleyDB and Storable, and a reporting/logging database which will be properly relational, allowing for the kind of reporting and queries that are needed for invoice generation, user reporting and all that jazz.

        Speaking of which, the customer is pushing that to the front of the story queue, time to sit down with a bunch of index cards, a handy whiteboard, and the office's token big iron database programmer.

Re: The fine art of database programming
by tmiklas (Hermit) on Apr 30, 2002 at 22:23 UTC
    Great post, ++

    I'd like to add one more thing to common pitfall list: the rule of exponential costs

    It really works... I've 'tried' it writing my first database program. Discovering fundamental mistakes and their results (like changing database layout when program is almost finished), etc. leads you to the point, when you spend much more time changing your code (changing != improving) than writing/adding a new one. So costs (not only time costs) rise exponentially with time. Now I know that spending even few days normalizing your database decreases total costs of developement process and then to write a program is just to code rules and actions planned during normalization process.

    Greetz, Tom.
Re: The fine art of database programming
by artist (Parson) on Dec 16, 2004 at 16:50 UTC
    Great post.

    While it is true that I cannot magically transfer to you my database programming experience.

    I wish for the intelligence that can do the same.

Re: The fine art of database programming
by Anonymous Monk on Dec 16, 2004 at 17:28 UTC
    In this subject, I am biased toward using the most powerful tool (the DBI), even at the price of some additional typing. It is true that, when you are a beginner, you feel intimidated by the DBI, and you'd like some simplifying solution that will save you some typing and will hide those hideous database details.
    I have a different opinion, I find the DBI limiting, and I don't always use it when accessing databases. The DBI is good if you want to create an application that can be moved from one database to an other. I've seldomly encountered a need for this. Usually, a shop has settled on a database vendor, and uses Oracle or Sybase or db2 or something else. And many applications or tools are written specifically for one company, and they won't distribute it. Of course, in many cases, you could still use the DBI - but since the DBI is platform independent, it doesn't allow you to tap into the wealth of features Oracle, Sybase or other databases have to offer. If you don't need them, fine. But if you're doing prototyping, if you're replacing tools writting in other languages, or if you are writing debugging or admin tools, I'd rather have something that allows me to access everything the database has to offer.

    Now, this post isn't intended to slap down the DBI - far from it. The DBI is great, and I've written many programs using it. But it isn't the end-all, and I've written many programs accessing databases which would have be harder when using the DBI.

      I would love to see an example of two Perl programs, one with and one without using DBI, where the one without DBI is easier to write.

        u65: I would love to see an example of two Perl programs, one with and one without using DBI, where the one without DBI is easier to write.

        The year 2004 cannot hear you :D

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://162771]
Approved by rob_au
Front-paged by jeffa
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (2)
As of 2024-03-19 04:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found