Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Backslash and Underscore problem with DBI and PostgreSQL.

by Seumas (Curate)
on Jun 21, 2004 at 04:16 UTC ( [id://368363] : perlquestion . print w/replies, xml ) Need Help??

Seumas has asked for the wisdom of the Perl Monks concerning the following question:

Using: PostgreSQL 7.4.3 and DBI 1.42

I have a table that contains UID and USERNAME columns. Several users have names like the following:

  • foo_
  • foob
  • fook

    When foo_ tries to authenticate with their credentials, the following is called:
    my $sth = $dbh->prepare("SELECT uid FROM user_accounts WHERE username +~~* ?"); $sth->execute($username); my ($uid) = $sth->fetchrow_array;

    But the UID returned is not for foo_ - it's for one of the other foo* users.

    Now, in PostgreSQL, the underscore character specifies a match on any one character so foo_ would also match foob. However, DBI should escape the underscore in the code above, correct?

    I assumed this would be so and that what DBI would be submitting to PostgreSQL would be foo\_. So I tried this manually in pgsql. It behaved the same way!

    The only solution that worked in pgsql was to do foo\\_ which then returned only the desired result of foo_. But this doesn't make sense to me. The documentation states that a double backslash is translated into a literal backslash. So rather than matching real usernames of foo_ shouldn't the above match a user named foo\_ where the backslash is a literal part of the name and not used to escape the following character?

    This is incredibly confusing and it's preventing a lot of my users from logging in tonight. I really don't want to have to write code in perl to pre-parse what $dbh->quote() or $dbh->execute() should already be parsing sucks.

    Can anyone shed some light on this for me? I tried to seek help in #postgresql but none of the several dozen people in those channels ever says a word for days at a time.

    Thanks for your time.
  • Replies are listed 'Best First'.
    Re: Backslash and Underscore problem with DBI and PostgreSQL.
    by blokhead (Monsignor) on Jun 21, 2004 at 04:25 UTC
      This begs the question, why are you using a pattern match (~~*) instead of an exact match (=) in this query? If you don't want the underscore to act as a wildcard, don't use it in a syntax where it would be interpreted as a wildcard.
      select uid from user_accounts where username = ?
      DBI should escape the underscore in the code above, correct?
      $dbh->quote only escapes data enough to keep it from busting out of its quotes. It doesn't escape pattern-match special characters like underscores, because (a) every underscore in your database would now have a backslash in front of it when you fetched it, (b) you could never use a placeholder to replace a genuine pattern in a query.


    Re: Backslash and Underscore problem with DBI and PostgreSQL.
    by Zaxo (Archbishop) on Jun 21, 2004 at 04:30 UTC

      It's doing what you tell it to. In the postgresql dialect, ~~* is ILIKE, case insensitive LIKE. That introduces pattern matching with underscore as wildcard. You probably should use the = (equals) operator for that comparison.

      After Compline,

        The code I show above is from a couple years ago when I first began using SQL and everything I've found in the postgresql mailing lists and docs indicated that if I wanted to do a case insensitive search, I needed to use the regex operators of ~* or ~~* and this is why you see the instructions above.

        Using an '=' isn't case insensitive but I realize that I could and probably should be using lower() as in SELECT uid FROM accounts WHERE lower(username) = lower(?).

        Is this the correct and fastest way to handle this? I realize my questions in this thread are really more geared toward PostgreSQL than perl, but it involves the perl DBI to some degree. And I know my question is a bit rudimentary to DBAs - but I'm not one and your responses are always greatly appreciated. :)

          Yes, forcing to lowercase and testing equality is much better than pattern matching.

          A username is probably intended to be a unique key, so if you want case insensitive matching, you should take steps to make sure that uniqueness is enforced in a case insensitive way. A rule on insert and update should be a nice way to do that in pg.

          The trouble with pattern matching is that, as you found, it breaks uniqueness of a key.

          After Compline,

          Is this fast:
          select ... from .. where lower(username) = lower(?)
          This may or may not be fast - it depends on how the query engine runs, and whether the optimizer can use an index when you apply a function on a column. It may work fine with Postgres, but I know that Sybase will not generate a good query plan with such a query.

          BTW - If you use "LIKE ..." you can normally escape any potential wildcard characters yourself. Sybase uses square brackets to do the escaping (and can use alternate escape characters as well), so that you could write:

          SELECT ... FROM ... where foo like ? ESCAPE '\'
          and then pass "foo\_" as the search parameter and not get wildcard expansion on the underscore. Check the Postgres docs for similar functionality.


    Re: Backslash and Underscore problem with DBI and PostgreSQL.
    by jZed (Prior) on Jun 21, 2004 at 05:30 UTC
      $sth->execute($username); ... DBI should escape the underscore in the code above, correct?
      No, not correct. Neither DBI nor Pg should automatically escape underscores since there's no way for them to know when you mean a literal underscore and when you mean a wildcard. It sounds like you've had a frustrating time, my condolences, I've been there. But I usually find that when I start thinking "module Foo really sucks" where Foo is a widely used and heavily supported project like DBI or Pg, it's usually turns out to be more a reflection of my frustration level than of the project. Hope it all works out for you.
        No, I definitely wouldn't say DBI:: sucks. It's a great package and makes life tolerable.

        I'm just frustrated that it isn't more simple to find some obvious answers sometimes. It isn't like I didn't search the postgresql and DBI documentation and postgresql mailing list archives for an answer (and ask for help in #postgresql) - yet the only answer I could find was the regex solution!

        In fact, now that I know better, it's clear that the answers to people in the mailing lists that I read themselves were also wrong. Self-education is often frustrating, but places like PM make it much easier... assuming one can overcome the fear of looking stupid for asking stupid or rudimentary questions. :)

        Of course, now if anyone else has such a stupid question, they'll get the right answer here with the keywords "sql dbi insensitive" ;)
          There are NO stupid questions! We've all been there ;-) It takes time to become familiar with the different pieces that make up a Perl Database application.

          Be aware that using '~*' and '~~*' pattern matching in your queries is most probably a PostgreSQL-specific thingy. Just my personal opinion here, but I try to stay away from database specific functions because if at some point in the future you change your database from Postgresql to Oracle (or whatever), you'll have to redo that code entirely.

          I try to stick with "standard" SQL and try to stay away from database specific functions. In this specific instance, the "lower" function will provide a database-independent way of doing a case-insensitive query. IMHO "lower" is a much better solution since it is database-independent.

    Re: Backslash and Underscore problem with DBI and PostgreSQL.
    by etcshadow (Priest) on Jun 21, 2004 at 18:32 UTC
      However, DBI should escape the underscore in the code above, correct?

      No, DBI has no knowledge of the context in which the bind variable sits... it just knows that it is binding in a literal string. Also, the ~~* operator is just acting from string onto another (it's not like perl where a pattern is a slightly different syntactic element than a plain old character string).

      Now, I don't know if the same syntax holds for Postgress as for Oracle... but in oracle SQL, you'd escape an underscore by doing something like:

      my $usernameescaped = $username; $usernameescaped =~ s/([\%_])/\\$1/g; my $sth = $dbh->prepare("SELECT uid FROM user_accounts WHERE username +like ? escape '\\'"); $sth->execute($usernameescaped); my ($uid) = $sth->fetchrow_array;
      That is, you use an escape character, and then you inform the database of what that escape character is (using the "escape" clause to your "like" match). You don't have to use a backslash, but I generally do, because backslash is what the cool kids use. :-D
      ------------ :Wq Not an editor command: Wq