Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I have created a simple sql script to write an email address (from form input) to a MySQL database, the database itself is very simple, and only stores just email addresses. Below is my code to read the data, line by line, but this script is expected to store upto 20,000 email addresses (will this cause problems?) so printing them line by line will take a while to load and create an extremely large page, could anyone advise me as to how I could read the data and split it in different pages, say 30 emails per page ?

$dbh = DBI->connect("DBI:mysql:$database:localhost","$username","$pass +word"); my $sth = $dbh->prepare("SELECT * FROM emails"); $sth->execute(); # iterate through resultset while(my $ref = $sth->fetchrow_hashref()) { print "Email: $ref->{'contest'}\n\n"; } # clean up $dbh->disconnect();


Thats how I do it at the moment, but splitting it into pages of 30 emails would make it a lot more organised.
Any help would be great appreciated

Thanks,
John.

Replies are listed 'Best First'.
(Ovid: Death to Select Star!) Re: MySQL DBI Help
by Ovid (Cardinal) on Sep 22, 2001 at 21:51 UTC

    On a note totally unrelated to your question, I couldn't help but notice your SQL:

    SELECT * FROM emails

    I do a lot of heavy database work and I've been bitten by this type of SQL many times. As a general rule, when trying to write maintainable code, it's a good idea to be very specific about what you will allow. For example, consider the following SQL statement:

    SELECT userid, email, contest FROM emails

    That SQL statement has a variety of benefits:

    1. It's immediately obvious to the maintenance programmer exactly what is being requested.
    2. It's more efficient than selecting a bunch of columns you don't need, if you have more columns in the table than you're actually using.
    3. If you do a $sth->fetchrow_arrayref, you don't have to worry about columns in the database being reordered.
    4. If someone renames a column, you will understand faster why your $ref->{'contest'} autovivified a hash entry and produced an undef value.
    5. If someone removes a column from the database, you get the same benefit as mentioned in the fourth point.

    If you're just writing a small program, or use very simple logic as what you have above, these issues may be less likely to impact you. However, as you do more and more work with databases, getting in the habit of stating explicitly what you want will reap tremendous rewards.

    Cheers,
    Ovid

    Vote for paco!

    The title of this node is a reference to Death to Dot Star!, for those who may be confused.

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: MySQL DBI Help
by larsen (Parson) on Sep 22, 2001 at 20:34 UTC
    You could use LIMIT directive in your SQL query. Since you're using Mysql, this is a good place to learn how. I hope this could be useful.
      ORDER BY would also go well with using limit... so you can get them in alpabetical order

                      - Ant
                      - Some of my best work - Fish Dinner

Re: MySQL DBI Help
by Anonymous Monk on Sep 22, 2001 at 23:14 UTC
    Thanks for your suggestions, and your tips Ovid - I will remember that for future use.

    On the suggestion of flat files, I am currently running a duplicate of this script on another server which uses flat files, I was advised to switch to a mysql database solution so that it eased the load on the system and would speed up the script.

    In what instances does mysql really become the beneficial choice if its more suitable to hold 20,000 email addresses in a flat file?

    Thanks

      I understand that MySQL has recently had some improvements to bring it closer to ACID compliance, but I am still leery of it. MySQL, as I have used it, is as best a file system with SQL, not a database. An excellent explanation of the limitations of MySQL is at Why Not MySQL (and explains what "ACID compliant" means). As for Open Source alternatives to MySQL, you can check out Postgresql or Interbase.

      I do not know who recommended MySQL to you or why it was recommended. If the only consideration is performance, I would strongly recommend that you examine this decision more carefully. One thing that many people misunderstand in making technology decisions is that the technology chosen is part of the answer. In order to answer a question properly, you need to get enough information to understand what the question is. With technology choices, we need to identify the problem solved, determine what resources we can apply to these problems and determine how the pros and cons of those resources affect our short and long term goals. Then, and only then, do we begin to have an adequately defined question. Once we have the parameters of the problem and possible solutions clearly defined, then we can begin to decide what technology solutions are an appropriate fit. Here are a few questions off the top of my head regarding that would affect whether or not MySQL is an appropriate solution for your problem:

      • What, exactly, is the problem that you need to solve?
      • Do you already have technologies in-house that may be applicable (this is important because just throwing another technology at a solution means more technologies to support -- too many technologies is like too many cooks!)
      • If you need a database, how many users are likely going to be using the system?
      • Will the database be primarily "read only"? (MySQL can be fine for that purpose)
      • Do you have the in-house capability to adequately support MySQL? Is there only one person who can support it? What happens if they leave?

      The last question is particularly vexing. At work, we primarily use MS SQL Server because that is the database the most of us are familiar with. For some of our projects, we are thinking about switching to Postgres, but there's some resistance due to the lack of in-house knowledge of the product. Unfortunately, that can be a valid complaint if we have acceptable alternative solutions.

      Cheers,
      Ovid

      Vote for paco!

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      I agree with CubicSpine and Ovid that you might want to re-evaluate the decision to use mysql. The only advantage to using mysql in these circumstances would be to enhance your perl/database skills. If this were the only consideration then by all means, continue doing so. But, OTOH, if the following are true:
      • each record in the file contains only one field, e-mail address.
      • once you get the file, you are the only person using it.
      • you don't expect that format of the file to change anytime soon (like by adding new fields per record).
      Then there is nothing that mysql can do that you can't do faster working with plain text files.

      I work with a file that has 48,193 94-byte records. On our ancient Sun Sparc 10, it takes 3 whole seconds to load the entire file and store it in an array. On our Sun Enterprise 350, it takes less than a second.

      In my case, however, I have to break each record into 10 fields, be able to sort across multiple non-contiguous fields, perform merges and complex selects. So I should consider using a database implementation even though the file is used only by me and is never updated (except when I replace the entire file weekly).

      "Make everything as simple as possible, but not simpler." -- Albert Einstein

      AM - I think that you will find that using a SQL database will give you a major advantage in situations where you are dealing with:
      • More complex data than just a single field
      • An application that does more than just "get everything in the database".
      In your situation, it might be worth doing some benchmarking. It wouldn't be too hard to set up a time test comparing reading in all the records from the DB and reading in all the records from a file. I think that if you can prove to your coworkers that the load is minimal then you should go with the file. Of course, this does depend on how often you expect to be performing this operation.
Re: MySQL DBI Help
by CubicSpline (Friar) on Sep 22, 2001 at 20:40 UTC
    Why don't you handle this from within your script? If all you want is to print out 30 email addresses per page, just select all the addresses and only print out 30 at a time. The way you've set up your database really limits your options of pulling out data.
      If he really wants to deals with multiple thousands of email addresses, this probably isn't the best way to go. I almost always use the features of the DB if I can, since it tends to be much faster and means my script uses less resources.

                      - Ant
                      - Some of my best work - Fish Dinner

        It sounded to me like he was going to be selecting them all anyway so I don't really see a performance advantage. And the only resources that are being used here is a single value to store the email address. I don't see a lick of difference in whether it's done in the script or not.

        You would definitely want to use the database resources for more sophisticated data retrieval. But this could be accomplished by just a flat file. Unless there's more to the script than just reading out these addresses, i'd actually go with a flat file instead.