|Problems? Is your data what you think it is?|
A recent glance at the database related tutorials in the Tutorials section indicated a lack of a very basic introduction to using databases so I decided to have a go at writing something to meet that need.
The following is intended as a very basic introduction to getting up and going with databases using Perl and DBI. SQLite (through DBD::SQLite) is assumed for all the sample code below because it is very easy to get going (just install the module) and for many people will be the only database they need for quite a while.
It is usual in this sort of tutorial to pass over the error handling. Despite that I've generally included error handling in the samples below. I also use transacted processing from the get go. Although this tutorial is intended to introduce the very basics of databases, it is also intended to demonstrate safe and robust technique right from the start. As a relative neophyte in matters database, I expect I'll receive a little feed back relating to some of the code!
Ok, so lets dive straight in by creating a simple database containing a single table of two columns:
Assuming the script ran without errors you may be wondering if anything actually happened? As a first check look in the directory you ran the script from for a file called Test.sqlite. If the file exists the database has been created. If you can't find it most likely the current directory when you ran the script is not what you thought it was - I'll wait a moment while you figure it out.
If an error did occur (maybe you don't have write permission for the
With the file found you can use one of several tools to examine the database. An internet search for "SQLite Browser" should find a copy of SQLite Database Browser which will open the database file and allow you to examine and manipulate it. If you use Firefox then the SQLite Manager add-on may be the thing for you. In either case point the tool at Test.sqlite and confirm that it contains a People table with name and age columns. Such database management tools help enormously in checking that your database software is doing as you expect and in fixing the occasional processing glitch or performing adhoc queries.
The code following #1 connects to the database (creating the database file if it doesn't exist already). The connect string specifies the database driver type and provides information to the driver about the connection. For SQLite the driver uses the third part of the connection string as the file name for the database.
The #2 code then configures some error handling related stuff. Setting AutoCommit to 0 sets the database to a mode where changes to the database are effectively buffered until commit is called. This allows a related set of changes to the database to be effected in a single call or else easily undone by a call to rollback (if something failed during processing for example).
Setting RaiseError to 1 causes errors to be handled by using die when the error happens rather than setting an error flag and hoping that the calling code will notice the issue.
Taken together setting AutoCommit to 0 and RaiseError to 1 help make database processing very robust because errors get handled when they happen and the database doesn't get left in a corrupt state if processing fails in some fashion.
Now take a look at #3. The string is the SQL statement that is used to generate the People table. The table contains two columns of data named name and age. The name column is set to contain variable length strings up to 128 characters long. The age column is set to contain integer values. Note though that SQLite uses the type information provided in the create statement as a hint to how the data is expected to be used, but doesn't feel obliged to actually store it that way. In this regard SQLite is much more forgiving than many other database engines. Note too that SQL is case insensitive for key words, but by convention they are given in upper case.
Tucked away in the safety of an eval following #4 are the two statements that actually result in creating the table and updating the database: the do and commit lines. do is actually short hand for two statements: prepare and execute. It can be used when a statement handle is not returned (more about statement handles later) and where the SQL statement only needs to be executed once.
Let's move on. Time to add some data to our table:
This code starts just the same as the previous script with connection to the database and setting up of the error handling. In later sample scripts this boilerplate code will be omitted, but you'll need to provide it for the scripts to run.
Again you can use your SQLite browser/manager to confirm that the script has done some work. Notice that there is now some data in the People table, in fact the data given in the %people hash in the script.
As in the first script a little SQL and a couple of function calls get the work done. The SQL statement at #1 is used to set up a statement handle that is used to insert data into the table. The VALUES part gives the values to be inserted. Notice that question marks (place holders) are used in place of the actual data. It is possible to supply the data in the SQL statement, but that should be avoided for several reasons (to avoid quoting issues and injection attacks for a start). The data to be inserted is supplied later in the execute (#4).
The prepare call generates a statement handle that is used to insert the data. The while loop (#3) iterates over the key/value pairs in the %people hash. For each iteration of the loop the execute (#4) inserts a name/age pair. Notice that two parameters ($name, $age) are passed in and match the two place holders (?, ?) in the SQL statement. When the loop completes commit (#5) is called to complete the transaction and commit the changes to the database.
Actually the whole insert loop can be replaced with:
which is both more compact and very likely more efficient, but only works if you can efficiently build the parallel arrays of values. Notice that an array reference is passed in for each place holder in the SQL statement. For the moment you can ignore the empty hash reference passed as the first parameter - it's there so you can pass attributes in to execute, but much of the time it's not needed.
Thus far we have created a database, created a table and added some data to it. But what a database is really about is getting the data back out again. Most often we only want to get a subset of the data out of a database by searching through the data using some criteria for selecting the data we want. Say for example we wanted to search the data we've added to the database for people whose name ends in the letter 'e' and with an age less than 25?
This time the script prints the results rather than updating the database. The SQL (#1) is a select statement which will retrieve rows containing name and age columns from the People table where the name ends in the letter 'e' (% matches anything, rather like .* in a Perl regular expression) and age is less than 25. The execute (#3) fetches the data from the database and the while loop (#4) iterates over the rows that were fetched one at a time to print the results.
Ok, that's got a simple single table database created, some data added and a search made on it. Databases provide a vast amount more power than we have looked at so far, and a lot of that comes when several tables are involved in a search, but that is a tutorial for another day. Mean time the documentation for the flavour of SQL understood by SQLite at http://www.sqlite.org/lang.html is a good reference for SQL.
True laziness is hard work