http://www.perlmonks.org?node_id=149597

doran has asked for the wisdom of the Perl Monks concerning the following question:

I have a test file I use to test escaping of 'special' characters:
\ = backslash \\ = two backslashes \\\ = three backslashes ' = a single quote '' = two single quotes \' = a backslash and a single quote " = a double quote \" = a backslash and a double quote '"' = a double quote within 2 single quotes Here's a test of formatting and odd characters. This is the second line. Blank line above and some characters below: < = less than > = greater than & = ampersand &amp; = an ampersand, a-m-p and a semicolon ! = exclamation
Let's say I stick that into a variable $var and want to stuff it into a MySQL database table. To escape out these special characters, I run the variable through the DBI's quote() method:
$var = $dbh->quote( $var ); # I could also use placeholders
The problem, if it is one, is that when I look into the database, the single quotes are are properly escaped (ie. \') but the backslashes are not escaped (ie. a single backslash is represented by a single \ in the database).

Reading the DBI (and MySQL) docs, this doesn't seems to be the correct behaviour, since backslashes are supposed to be escaped with another backslash (ie. \\). So far I haven't gotten any errors and everything seems to be producing the expected results. But I'm wondering why this is. Is this correct behaviour? Shouldn't $dbh->quote() put an extra backslash in front of each backslash, or am I mistaken?

Also, please let me know if this is actually a MySQL question. I'm asking here because I'm using DBI's quote() function and I suspect that may be the culprit, for better or worse.

tia

Replies are listed 'Best First'.
Re: Backslashes with DBI quote( ) and MySQL
by gmax (Abbot) on Mar 06, 2002 at 08:04 UTC
    Using placeholders has the same effect as the quote() method.
    #!/usr/bin/perl -w use strict; use DBI; # change the connect statement according to your needs my $dbh = DBI->connect("DBI:mysql:test;host=localhost;" ."mysql_read_default_file=$ENV{HOME}/.my.cnf", undef,undef,{RaiseError => 1}); my @quotes = ("\\", "\\\\", "\\\\\\", "'", "''", "\\".'"', '"', "\\".'"', "'" . '"' . "'"); $dbh->do(qq{drop table if exists test_chars}); $dbh->do(qq{create table test_chars (quote char(5))}); my $sth = $dbh->prepare( qq{insert into test_chars values ( ? ) } ); for (@quotes) { $sth->execute( $_ ); } print_chars(); print "\n"; $dbh->do(qq{delete from test_chars}); for (@quotes) { my $quoted = $dbh->quote($_); $dbh->do( qq{insert into test_chars values ($quoted)}); }; print_chars(); sub print_chars { my $aref = $dbh->selectall_arrayref( qq{select * from test_chars} ); for my $row (@$aref) { for my $field(@$row) { print "<$field>\t"; } print "\n" } } $dbh->disconnect(); __END__ output: <\> <\\> <\\\> <'> <''> <\"> <"> <\"> <'"'> <\> <\\> <\\\> <'> <''> <\"> <"> <\"> <'"'>
    This sample script will insert your values escaping them with placeholders. Then it repeats the same operation using the quote() method.
    As you can see, the result is the same.

    One more possibility would be to use a hexadecimal string, but it might be overkill. However, just in case, the method is as follows:
    my $hexstr = "0x" . unpack("H*", $barestr); $dbh->do(qq{insert into test_chars values( $hexstr )}); # notice that the hexstring has NO QUOTES
    As long as you can manipulate your string with Perl, you can always turn it into an hex string, solving any possible quotes problem. The drawback is that your string in the query will be twice as long as the original. It's important to bear that in mind when you calculate the size of your query.

    update
    Concerning the problem you mention about the special characters not being escaped in the database, it really depends on the client you are using. When you say "look into the database" you are actually asking a client to fetch some info from the server. Therefore all you see depends on the reporting capabilities of the client you are using.
    Here is an example, from the standard mysql client in Linux.
    mysql> select * from test_chars; +-------+ | quote | +-------+ | \ | | \\ | | \\\ | | ' | | '' | | \" | | " | | \" | | '"' | +-------+ 9 rows in set (0.00 sec)
    Another client, mysqldump will escape our data:
    # # Dumping data for table 'test_chars' # INSERT INTO test_chars VALUES ('\\'); INSERT INTO test_chars VALUES ('\\\\'); INSERT INTO test_chars VALUES ('\\\\\\'); INSERT INTO test_chars VALUES ('\''); INSERT INTO test_chars VALUES ('\'\''); INSERT INTO test_chars VALUES ('\\\"'); INSERT INTO test_chars VALUES ('\"'); INSERT INTO test_chars VALUES ('\\\"'); INSERT INTO test_chars VALUES ('\'\"\'');
    HTH
    _ _ _ _ (_|| | |(_|>< _|
      Yea, I'd forgotten to mention the client. I had used MySQLGUI and the command-line client you mention above (though I'm running on Win32). I also did a mysqldump, which did essentially the same thing as yours. It seems everything is behaving as it should. In the meantime, my education continues...

      Thanks to all.

Re: Backslashes with DBI quote( ) and MySQL
by lachoy (Parson) on Mar 06, 2002 at 02:52 UTC

    If the data made it into the database and is the same coming out as you put it in then I don't think you have a problem :-)

    You may be running into how the mysql client represents textual data (itself escaping quotes but not backslashes) or something similar. All the quote() method does is ensure that the data can be put into the database (or passed to a stored procedure, or whatever) -- what the database does with the data after that is its own business.

    Chris
    M-x auto-bs-mode

Re: Backslashes with DBI quote( ) and MySQL
by webadept (Pilgrim) on Mar 06, 2002 at 07:25 UTC
    I would have to agree with lachoy on this, and it bothered me for some time as well. What caught my eye with your problem was you say the "escaped single quotes" and if all went correctly you shouldn't be "seeing" anything except a single quote.

    When I noticed this problem was when I was taking in information from a web source or sending it as a POST to the program. When that happens, the strings are "escaped". Taking those strings in and escaping them again with "quote()" would give the out put you are describing.

    Since using quote() is a good idea, espesally with strings, what I did to solve my problem was to unescape the input strings and then use "quote()" on them, re-escaping them. This insured input and that unwated effects were not taken in without some messure of deturance.

    okay, enough out of me, hope that helped.

    webadept.net
Re: Backslashes with DBI quote( ) and MySQL
by Juerd (Abbot) on Mar 06, 2002 at 17:38 UTC
    Escaping is not done to put lovely backslashes in the database. Escaping is just to make sure the data gets in the database correctly. SQL uses quotes to delimit text, so they need to be escaped when the data contains them, and you of course need to escape the character you use to escape other things with.

    With INSERT INTO foo VALUES ('bar\'baz') the string bar'baz is stored, without the backslash. It's like saying print "bar\"baz", which prints only bar"baz, again without the backslash itself.

    Many, many PHP programmers are confused about escaping and quoting, because PHP quotes strings for you by default. That is, if you have the user input bar'baz, the actual string content will be bar\'baz, which is terribly wrong and confusing. (Blame PHP.) Some then read mysql can't handle single quotes, because they're used as delimiter and start escaping the quotes and backslashes, turning the string into bar\\\'baz, and with 'bar\\\'baz' in an SQL query, bar\'baz is stored in the database. This is where a lot of web programmers using PHP go wrong. They have escaped data in their database, which is very confusing, and a waste of space.

    DBI has a nice escaping mechanism. Instead of escaping everything that could be dangerous, only that what is dangerous is escaped.

    If you have the user input bar'baz in Perl, your string will have those 7 characters, and no disgusting automatic escaping. That means you will have to arrange for the backslash to be there yourself. You could do it the hard and potentially dangerous way, by using quotemeta or some s///-construct, but DBI has a database-specific method called quote, which is handled internally if you use placeholders. If you use that, the data is turned into bar\'baz for usage in an SQL query. In the query, there will be 'bar\'baz', so the data stored in the database is bar'baz. That also means you do not have to un-escape what you get out of the database. (Again PHP confuses people by having a function that un-escapes automatically. If with SQL everthing is done the way it's supposed to be done, you never have to unescape, because there are no escapes in the database!)

    The following script will create a test2 table in the test database, and insert and immediately select the data that follows __DATA__ (through the magic DATA filehandle). It will then report whether the data has changed or not:

    #!/usr/bin/perl -w use strict; use DBI; my $dbh = DBI->connect('DBI:mysql:database=test', 'root', ''); $dbh->do('drop table if exists test2'); $dbh->do('create table test2 (xyzzy blob)'); undef $/; # Slurp mode my $before = <DATA>; $dbh -> prepare('insert into test2 values( ? )') -> execute($before); my $sth = $dbh -> prepare('select * from test2'); $sth -> execute; my ($after) = $sth -> fetchrow_array; print $before eq $after ? "Data has not changed\n" : "Data has changed!!\n"; __DATA__ \ = backslash \\ = two backslashes \\\ = three backslashes ' = a single quote '' = two single quotes \' = a backslash and a single quote " = a double quote \" = a backslash and a double quote '"' = a double quote within 2 single quotes


    (I experimented a little with style in here. When you read everything that is indented by one level, you read only the program flow. I haven't used this style before, but I kind of like it. Please let me know what you think about having the methods indented like this.)

    HTH

    ++ vs lbh qrpbqrq guvf hfvat n ge va Crey :)
    Nabgure bar vs lbh qvq fb jvgubhg ernqvat n znahny svefg.
    -- vs lbh hfrq OFQ pnrfne ;)
        - Whreq
    

      Just a stylistic comment, since you asked.
      Personally, I don't like the visual impact of methods indented as you did, although I have to admit that it is easier to identify them. Maybe it's a matter of getting used to it.

      I have three more stylistic remarks about your script, though.
      1. Your connect statement is not something a sensible administrator would recommend. Leaving the 'root' user without a password is calling for trouble. That's why I always write my examples referring to a configuration script or using a bogus password. The readers can't use your script as it is, unless they have a configuration file. So, since they have to change it anyway, I think it's better to show them the best practice.
      2. You should either set RaiseError in the connect or deal with the errors after each call to a DBI method.
      3. You should call $dbh->disconnect at the end of the script.
      Cheers
      _ _ _ _ (_|| | |(_|>< _|