Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

OT: SQL and me

by Lady_Aleena (Curate)
on Sep 27, 2013 at 17:28 UTC ( #1056042=perlmeditation: print w/replies, xml ) Need Help??

I am newish to SQL; I have not chosen a DBMS yet. People here and elsewhere have been beating me over the head with SQL. (GrandFather wrote RFC: Databases made easy over two years ago with me in mind, but it went right over my head.) I currently use .txt files to store my data. Most of the files are pipe separated values. (I do not like using commas since some values I store have commas in them and quoting would take up way too much disk space.) Going from .txt files which I can easily parse with a Perl function I have written specifically for my files to a format which I would have to use Perl modules written for more advanced users is very daunting. Not only would I have to figure out how to get my data into SQL files; I would have to figure out a way to get it back out of them. There is also the fear of typos in the data I currently do not know how to fix. (With my .txt files, fixing a typo means opening the file in my trusty text editor, I can not do that with SQL.)

I first started learning Perl to help get the size of my website under control back when I only had 10MB of space for it. Storing my data in the smallest space possible was a priority (hence pipes instead of commas so I would not have to quote my values). When I finally got more space, I was already neck deep in the way I am currently storing my data. I have a lot of little .txt files with little pieces of related data (across 200+ of directories and 800+ files in one case).

Since I do not know SQL, I don't know how to visualize the data structure it creates. I am familiar with software like Excel and less familiar with Access. (Access is a pain to set up which is why I avoid using it.) So, is SQL a big single table or tables within tables (within tables)? I have data I can experiment with which has a bit of complexity which I am unsure SQL can do, but SQL might be able to handle my eccentricities.

For my music database I am thinking of creating, there will be a table for "artists" with three fields for the name: "first name", "last name", and "band". Only one field will have to have an entry since if the artist is a band, there will be no first or last name. Also there are artists with only one name (usually first). However I already have a problem since some albums have various artists who I do not want to list individually, and with the above fields I do not have a place to put "various artists" really. Soundtracks also break the above. Do I create a fourth field called "special"?

Some artists I like I have not bought any albums, so I won't have a problem there. For the artists for whom I have bought albums, there will be another field which will be a table called "albums". Each album will have "title", "year", and "format". Even here there is a problem, some "albums" are 45s which do not have a singular name just the names of the songs on the 45. Format is an array since I own some albums in multiple media types (lp, cassette, or cd and one day straight digital). There is one case where an album is shared between two artists (The Police and Sting who will have separate entries under artists).

All of my data can be found here and here.

So I am wondering if SQL can do what I want, do I have to change my thinking, how do I set it up if it can, how hard will the code be to write, will it allow me to break up some values which I currently have lumped together, and some thoughts which are ambiguous.

I wish years ago when I was first starting to store my data in separate files, someone would have grabbed me and said "This is SQL, all the big kids use it; and if you want to play with the big kids, you need to use it too." I was a tad more pliant then.

Have a cookie and a very nice day!
Lady Aleena

Replies are listed 'Best First'.
Re: OT: SQL and me
by aaron_baugher (Curate) on Sep 27, 2013 at 20:24 UTC

    In short, yes, a SQL database will do the things you're talking about here. In a typical database, you have a group of tables. Each table is made up of some number of records, each containing the same named fields. So your artists table might look something like this:

    CREATE TABLE artists ( first_name text default NULL, last_name text default NULL, band text default NULL, id int(5) NOT NULL auto_increment );

    That creates a table where each record has four fields, named 'first_name', 'last_name', 'band', and 'id'. The first three contain up to a certain limit of text (depending on your DBMS and OS), and can be empty (NULL). The last one contains an integer of up to 5 digits, cannot be empty, and will automatically be filled with the next highest integer for that field if you don't provide one. So once you create this table, you can start adding artists:

    INSERT INTO artists (first_name, last_name) VALUES ('Billy','Joel'); INSERT INTO artists (band) VALUES ('AC/DC');

    Now, the other important thing is that your tables can (and usually do) have relationships. That's why you'll often see the term RDBMS -- the R stands for relational. One of the most important things in designing your database is determining the relationships between your tables. It may be one-to-many, many-to-one, or many-to-many. Let's say your album table looks like this:

    CREATE TABLE albums ( name text NOT NULL, year int(4) default NULL, format enum('lp','cassette','cd','45'), artist int(5) default NULL, id int(8) NOT NULL auto_increment );

    Now each album must have a name and can also have a publication year. It has a format which must be one of the possibilities allowed by enum(), and it also has an id which is an integer of up to 8 digits that is auto-incremented. It can also have an artist value -- but instead of having the artist's full name and information, this contains an integer which corresponds to an id of one of the records in your artists table. You now have a one-to-many relationship, because one artist can be assigned to many albums. If your artists table grows to contain numerous fields, that one id will connect all that info to each album it's assigned to. Now if you look at your artists table and see that Billy Joel has ID 123 and AC/DC has ID 456, you can add some albums:

    INSERT INTO albums (name,year,format,artist) values ('Glass Houses','1 +980','lp',123); INSERT INTO albums (name,format,artist) VALUES ('Piano Man','45', +123); INSERT INTO albums (name,year,format,artist) VALUES ('Back in Black',' +1980','cd',456);

    Of course, this is very simplified compared to what you'd need for a real music database. You'd want a lot more flexibility in what can be entered for different fields. Format might become its own table, for instance, so you can easily add formats and index them by ID. You'd probably want a many-to-many relationship between artists and albums, so that an artist could have multiple albums and an album could have multiple artists. That requires a third table to track the connections between the two, so that's getting a little more advanced. All doable with some practice, though.

    Aaron B.
    Available for small or large Perl jobs; see my home node.

Re: OT: SQL and me
by BrowserUk (Pope) on Sep 27, 2013 at 21:23 UTC

    The first question that pops into my mind is: Will the kind of access/processing you are doing benefit from beign done through SQL?

    A few (pointers to) examples of the type of processing you are doing would perhaps lead to better answers.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: OT: SQL and me
by wjw (Priest) on Sep 27, 2013 at 21:07 UTC
    Agree with previous poster in that "Yes, this is a very good match for a RDBMS".

    As you are just getting started with the DB stuff, I would start with SQLite which has some real advantages for this kind of a project:

    • DB is a single file
    • Has a good DBD module with reasonable documentation
    • It provides all the functionality you are asking for without having to get into a lot of DB admin like you might with heavier DB's like mysql, postgreql etc....
    • The online docs are not bad

    I would do is look for existing DB implementations (google search) as the album app is commonly used as an example for simple DB development.

    And finally, I would delve into basic DB design by checking out some books like This one.

    Hope that is helpful and encouraging...

    • ...the majority is always wrong, and always the last to know about it...
    • my will, and by will alone.. I set my mind in motion
Re: OT: SQL and me
by trippledubs (Chaplain) on Sep 28, 2013 at 05:37 UTC
    I think there is an easier way to visualize it. Tables don't contain other tables. A table is like a piece of paper. If you were to manually write out what a relational database does, you would be looking at three pieces of college ruled paper side by side.
    band_members ############################################################# ID First Name Last Name ############################################################ 1. Paul McCartney 2. Ringo Starr 3. George Harrison 4. John Lennon band_members_and_Bands_linking_table ########################################################### band_members_id bands_id ########################################################### 1 1 2 1 3 1 4 1 3 2 4 3 1 4 1 5 4 5 3 5 Bands ######################################################### ID Band Name ######################################################## 1. Beatles 2. Traveling Wilburys 3. Plastic Ono Band 4. Wings 5. Solo
    That is pretty much it. If this example is any good, you are already visualizing and tracing the relationships between the data.

    SQL is fun to learn, because there are only three commands. select, delete, and insert. There is 'update' if you really want to get fancy and combine delete and insert into a single statement. I would not try to Perl and SQL at the same time. I recommend the Head First SQL book, that will probably teach you all you want to know, if you want to go further, I like the MySQL certification book. You could combine those three tables into one, but this style is preferred, because it makes it harder to learn and raises the barrier to entry for Database Administrators. enhances performance

      If you are going to invoke "performance", DELETE-INSERT is inefficient, & bug inducing simply by virtue of being two operations, than UPDATE.
Re: OT: SQL and me
by hdb (Monsignor) on Sep 29, 2013 at 15:54 UTC

    You could give it a try using DBD::CSV. This way you can formulate your problem in SQL, all data is stored in CSV files and thus accessible to your text editor and if it turns out to be successful and the way you want to go, you can just go for a "proper" database system by just exchanging the driver.

Re: OT: SQL and me
by pvaldes (Chaplain) on Sep 28, 2013 at 12:07 UTC

    With my .txt files, fixing a typo means opening the file in my trusty text editor, I can not do that with SQL

    This is even easier in sql code:

    update artists set band = "Goldfrapp" where band = "Golf rap";
Re: OT: SQL and me
by talexb (Canon) on Oct 01, 2013 at 18:59 UTC

    Let me just chip in with a suggestion to try out sqlite3 -- it's a very handy, very simple implementation of an SQL engine that you can easily play around with to your heart's content. I'm currently using it to fiddle around with a schema I'm developing for an application of my own.

    It is a bit of a challenge to get familiar with SQL -- that join feature can be tricky to sort out, but once you do, you'll find that the database can do all of the messy work. :)

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      ... using it to fiddle around with a schema...

      You would think, for fiddling around you'd use :-)

      (It's a little slow but otherwise it worked pretty well when I tried it.)

Re: OT: SQL and me
by soonix (Abbot) on Sep 28, 2013 at 21:18 UTC
    Main difference between Excel sheets and SQL tables: in SQL, there is no row number. To distinguish unique rows, you have to rely on the data. Most often, one uses a separate "id" column, as in the examples already given in this thrrad, but if a field (or combination of fields) contains unique data, you can use that as (primary) key. Besides, defining a key for a table is optional, but of course, most often it is better to have one.
Re: OT: SQL and me
by Anonymous Monk on Sep 27, 2013 at 18:36 UTC

    Data storage devices|files will still be fatter than text files with quoting used.

    To correct typos, select the data with whatever search criteria; edit data; commit updated data to database.

    SQL is not data structure. It is a language to manipulate, fetch, store data as set.

Re: OT: SQL and me
by Anonymous Monk on Sep 27, 2013 at 19:26 UTC
    It is never too late and you are never too old to profit from the positive experiences of others. The essential concepts of SQL can be absorbed in a few hours' time.
Re: OT: SQL and me
by Anonymous Monk on Sep 28, 2013 at 01:34 UTC
Re: OT: SQL and me
by gnosti (Hermit) on Oct 04, 2013 at 10:47 UTC
    Not having experience with SQL limits my competence in answering, however I would like to observe that (1) you already appear to have a functional system and (2) at present, your data structures are not especially complicated.

    I would be interested to know why what benefits a move to SQL will bring.

    Right now, your data is stored in files that you read into perl data structures. You can write perl to get the entries that you want, to combine them, to print them, etc.

    In the database approach, you will rewrite your scripts/applications to use SQL instead of perl to access your data. As you observe, you will write extra utilities to enter/update entries, replacing use of your text editor.

    Although your editor may be trusty, every time you use it, you run the risk of editing entries by accident. That becomes a problem when you have many, many entries because an inadvertent mouse-click or keystroke could change an entry without your intention/awareness. A text editor has no protection against such randomness creeping in, although you could finesse the issue by using diff (or git-diff) to inspect your files' changes after each update.

    If you've decided to changeover and managed to create a database with all your entries, the problem becomes how to rewrite your scripts with the least amount of pain. Probably you will modify them so that the data access part is contained in a few subroutines. You could even write tests to verify that these subroutines work properly. Later you can rewrite these subroutines to use the database instead of perl structures. When the tests pass, you know that everything (at least everything you test) performs as expected.

    A large part of programming is changing code that already works into more beautiful/maintainable code that does that same thing. However, it is often practical to keep old, proven code.

Re: OT: SQL and me
by Lady_Aleena (Curate) on Oct 07, 2013 at 17:59 UTC

    Most of my data is processed to be displayed on the web, though in case of catastrophe I can print out the relevant data for the insurance company. My goal is to find an easier way to store, add to, edit, and access my data. My data is currently a mixed hierarchical and tabular design. From what I am seeing here, SQL is just tabular with a lot of repetition. (From the example above, I would be putting all artists' albums in one jumbled table instead of individual artists' albums tables such as "Duran Duran's albums" or "Sting's albums".) My goal is to find a data storage system which has fewer files to maintain like a music file and a movie file (my movie data is ten times more of a headache than my music data).

    If you are very interested, you can crawl all over my data to see how I am storing it now. In some spots, it is a jumbled mess. Do you think my data could ever fit into SQL tables?

    Thank you all for dropping by and giving me something to look at and think about.

    No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
    Lady Aleena
      From the example above, I would be putting all artists' albums in one jumbled table instead of individual artists' albums tables such as "Duran Duran's albums" or "Sting's albums".

      That's true, but you get some flexibility by doing that. To start with, you now have a single, uniform representation of all of the data. You know that every album is in the album table, for example. You also get a query mechanism that lets you perform ad hoc queries you don't have to plan in advance, as long as your data fits the relational model. For example, to get the titles of all albums released in the 90s, you could write:

      SELECT title FROM album WHERE release_year >= 1990 AND release_year <= 1999;

      Or just Sting albums from the '90s:

      SELECT title FROM album JOIN artist USING(album_id) WHERE release_year >= 1990 AND release_year <= 1999 AND = 'Sting';

      Or Sting and the Police albums:

      SELECT title FROM album JOIN artist USING(album_id) WHERE = 'Sting' OR = 'The Police';

      You can certainly store hierarchical data in a database, but it is a bit more complicated.

      A simple example is a family tree -- each person entry would have two other people as their parent. These people would all be in the same table, so you'd have to do a join back on the table that you're selecting from.

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      Frankly, after looking at your data, SQL is exactly the way that I would go.

      If your goal is simply to come up with a solution and not spend much effort at it, then organizing what you have in spread sheets or text files, or in a Google Doc spreadsheet is probably good enough. You can make a nice form for new entries and as a result you will even have a pretty decent GUI to help you maintain your information.

      If, on the other hand, you are looking to learn SQL, then your project is an excellent one to accomplish that with.

      I had mentioned in a previous post that SQLite might be a good option for you, and I stand by that. The process of creating the DB will force you to make decisions about what you really want from this project. Additionally, making a web interface (GUI) for SQLite is not much different than doing so for any other DB. You get the advantage of learning basic SQL, its integration with Perl via DBI/DBD and the UI via CGI etc... .

      Another advantage is that SQLite is a single file DB, which means you can move it around from system to system(Windows, *nix etc...), back everything up in one single step(copy the file to somewhere safe). You mention 'fewer file to move around', there would only be one in this case.

      You also mention movie information. You can decide whether you want two separate DB's, one for music and one for movies, or simply combine them in the same DB, again to ease keeping track of stuff(I would use one DB because there is the possibility of correlating musicians and music to movies which can be done across db's, but it is more to keep track of).

      The project gets broken into a few basic parts:

      • Data storage design
      • Data storage development(create the db)
      • Mass import(of existing data) including cleanup and prep
      • GUI design for maintenance, updated, and display
      • GUI devel

      The design of GUI and DB are fairly closely coupled initially. If you start by listing what you expect to do at the GUI, you will end up with a good rough db design which you can normalize when you focus on the DB design.

      For example:

      • I need to be able to add Band names
      • I want to add members to bands
      • I want albums to associate with Bands and Members
      • I want songs associated with Bands, Members, formats, albums
      • ....

      The requirements you state about what you want imply a DB to me. Whether you use a DB, or use cvs files or a spreadsheet, the functionality you describe is what a DB is really good for.

      As a very rough start, you could take the following and load it into sqlite(free download I am sure you can find for your env) to give yourself a feel for what you might want to do if you go the DB route.

      CREATE TABLE "Groups" ( "eid" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "name" TEXT ); CREATE TABLE songs ( "eid" INTEGER NOT NULL, "title" TEXT, "album_id" INTEGER, "artist_id" INTEGER , "year" TEXT); CREATE TABLE "Albums" ( "eid" INTEGER PRIMARY KEY AUTOINCREMENT, "title" TEXT, "year" TEXT, "format" TEXT ); CREATE TABLE artists ( "eid" INTEGER, "first_name" TEXT , "last_name" TEXT, "moniker" TEXT);

      Best of luck...

      ...the majority is always wrong, and always the last to know about it...
      Insanity: Doing the same thing over and over again and expecting different results.
      I'll just give one warning about editing: unless you write a GUI interface for every type of data you have [1], it's pretty tough to actually edit single entries. It gets really tedious to type out "UPDATE sometable SET somecol = 'something' WHERE uniquelyidentifyingcondition" every time you find a typo.

      You probably want a GUI for adding entries, too. But the initial inserts can be done with a bulk loader written in Perl. The bulk loading is the easiest part.

      [1]: web frameworks with a 'scaffolding' feature help immensely here.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1056042]
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (4)
As of 2018-12-15 13:18 GMT
Find Nodes?
    Voting Booth?
    How many stories does it take before you've heard them all?

    Results (69 votes). Check out past polls.