Note: This article applies to InterBase. However, many SQL database servers have a similar limitation.
About Parameterized Queries
The purpose of a parameterized query is to reduce some of the workload
on the server when executing the same query multiple times. The
workload reduction comes from the fact that the query need only be
compiled and optimized once. Subsequent executions can use the same
compiled statement and same optimization PLAN.
All queries must be prepared before they are executed by the server. Note, however, that in BDE applications, doing an explicit MyBDEDataset.Prepare
causes the BDE to read and retain metadata information which would otherwise be refetched each time the query is run. This means that "explicitly" preparing a BDE dataset gives a performance improvement much greater than what is gained from the server-side prepare alone. However, many other data access components, such as InterBase Express, retain this metadata information whether or not you "explicitly" prepare a dataset, so in non-BDE applications there is no reason to do this. When I talk about preparing a statement in this article, I am referring to the server-side prepare which is done implicitly no matter how you execute the statement, not the TQuery.Prepare
method.
Because of this, the server can execute the query faster on the second and all subsequent times the same prepared statement is used, because there is no need to recompile and re-optimize the statement. So if you repeatedly look up single records in a large table, it is faster to write a query such as this:
SELECT
SOME_COLUMN
FROM
SOME_TABLE
WHERE
SOME_TABLE_ID = :Foo;
...than it is to literally specify the value of the "Foo" param in SQL and change the dataset's SQL property each time.
So in general it is a good practice to use parameters instead of literal values in SQL whenever you need to execute a query more than once. Parameters also ensure that you don't have to think of issues such as date formatting; date parameters are sent to the server as binary data, not strings, so there is no possibility of error if the client and server use different regional settings.
Parameters and IN
However, there is one thing you cannot do with a parameter: You can't pass a list of values. So if we change the query above to this:
SELECT
SOME_COLUMN
FROM
SOME_TABLE
WHERE
SOME_TABLE_ID IN (:Foo);
...there is no way to assign '1000, 1001, 1002' to the "Foo" parameter, since it accepts only a single value, not an array or a comma-delimited string
A Deeper Examination
Now, let's look at how InterBase optimizes a couple of queries. I'll
use a simple query against employee.gdb:
SELECT
*
FROM
EMPLOYEE
WHERE
EMP_NO IN (136);
InterBase generates the following PLAN
when the query is prepared:
PLAN (EMPLOYEE INDEX (RDB$PRIMARY7))
A PLAN
is InterBase's syntax for displaying how it has optimized the query. Depending upon the SQL and the database's metadata, there are potentially many different ways that InterBase could optimize a query. For example, for a particular table it can read records in storage order or it can use an index to go directly to particular records. The InterBase optimizer selects a particular plan for each prepared statement at the time the statement is prepared based on what you have asked the server to do and what the server knows about the indices available. Don't worry if you can't read the PLAN
syntax; just take my word for it that a different PLAN
means that the server will be doing someting different internally to run the query, and that not all statements working on the same table can use the same PLAN
, since different statements require the server to do different things.
Here's another query which is the same, except that now there are two
items in the IN conjunct:
SELECT
*
FROM
EMPLOYEE
WHERE
EMP_NO IN (136, 138);
And the PLAN is different:
PLAN (EMPLOYEE INDEX (RDB$PRIMARY7,RDB$PRIMARY7))
If I try to force InterBase to use the PLAN from query 2 with the text
from query one:
SELECT
*
FROM
EMPLOYEE
WHERE
EMP_NO IN (136, 138)
PLAN
(EMPLOYEE INDEX (RDB$PRIMARY7));
...InterBase simply ignores the PLAN bit and uses the PLAN it feels
appropriate.
This is why you can't pass a list to a param. Since it would change
the optimization PLAN, there is not a single compiled statement which
could execute for any arbitrary number of values in the list.
Next question: Why does IB use different PLANs for different values in
the list? The answer is that, to IB, these two statements:
/* Statement 1 -- IN */
SELECT
*
FROM
EMPLOYEE
WHERE
EMP_NO IN (136, 138);
/* Statement 2 -- OR */
SELECT
*
FROM
EMPLOYEE
WHERE
((EMP_NO = 136) OR (EMP_NO = 138));
...are precisely the same. If you look at the generated BLR, you
will find no difference. IN with a list of constants is "syntactic
sugar" for a ORed list of conjuncts testing for equality with the same
field name. If you view the IN as simply shorthand for an ORed list of
conjuncts, then it will make sense as to why a different PLAN is used
-- different numbers of conjuncts mean a different query to execute.
BLR is InterBase's internal representation of how a query is run. BLR, roughly, is to InterBase what ASM is to Delphi or C++ source code. Saying that the BLR is the same means that the server will do exactly the same thing to run these two queries.
IN with a Subquery is Different
Contrast this, on the other hand, with the following:
SELECT
*
FROM
EMPLOYEE
WHERE
EMP_NO IN (SELECT EMP_NO FROM EMPLOYEE_PROJECT);
This query is similar to the problem above in that there is a variable
number of items in the list, but unlike the lists of constants, there
is only one logical way to execute the statement: First run the
subquery, then compare the results with the rows in the result set of
the outer query. Here is the PLAN:
PLAN (EMPLOYEE_PROJECT INDEX (RDB$PRIMARY14))
PLAN (EMPLOYEE NATURAL)
What's interesting about this is that the PK is no longer being used
for the EMPLOYEE table -- InterBase's optimizer has (correctly) decided
that it will be faster to do a storage-order scan of the table. Since
half of the rows will be returned, this is faster than fetching matches
out of storage order.
Now, the big question: Since IN with a list of constants is more
similar to IN with a subquery than an arbitrary list of ORed conjuncts,
why does InterBase treat it more like the arbitrary list than the
subquery WRT optimization?
I'm not sure. But it would make our report generation easier if it
didn't.
By Craig Stuntz
Connect with Us