 |
This is the technical paper from a talk given at the 12th Annual Borland Developer's Conference
By Robert Schieck MER Systems Inc. |
Robert Schieck is the president of MER Systems Inc., a Toronto, Canada-based company that provides custom client/server software development and training. Robert is a Borland Certified Delphi, C++Builder, and JBuilder instructor, a member of TeamB, a Certified Netware Engineer, and founder and past president of the Toronto Delphi Users Group. Robert has a B.S. in mechanical engineering from the University of Toronto in Toronto, Ontario, Canada. The MER Systems Web site (http://www.mers.com) is the host for the InterBase List and News Servers, which contain the largest body of public InterBase information in the world.
Note: No portion of this paper may be distributed or reproduced by any means, or in any form, without the author's and/or Borland's prior written permission.
Introduction
Delphi and CBuilder are wonderful tools. They allow developers with little
or no data base experience to build database applications by bolting components
together. It is very easy for inexperienced Developers to build small database
systems that work reasonably well. Unfortunately as the number of users grow
or the size of the database gets large the developer needs to have a better
understanding of how Delphi/CBuilder works with InterBase to create reasonably
performing systems. This paper will present some tips to help you with getting
better performance from your Delphi/CBuilder/InterBase system.
1) Fully Populate Your Database Before You Build the
Front End
Most developers don't do this. What most developers do is create their meta
data ( create their tables and indexes) , populate each table with six records
and then build their application.
We all make mistakes and with only six records in each table you won't know
during the development process whether you have made an error or not because
no matter how you retrieve six records from the database it is always fast.
If on the other hand, you had put a hundred thousand records into your customer
database and during the development process you found it took 15 seconds to
retrieve a customer, you would know immediately that you have a problem that
needs to be fixed.
It's sort of a pay me now or pay me later scenario. If you fully populate your
database before you start development you'll find your errors and problems during
development or you can put six records in your tables and find your errors during
production when the heat is really on.
2) Use the SQL Monitor to Educate Yourself on What
Your Front End is asking of InterBase
Give a man of fish and feed him for a day, teach him to fish and feed him for
life. The SQL Monitor is your fishing pole for doing database development. It
allows you to see the conversations that go on between the client and the server.
The SQL Monitor allows you to compare how application changes affect the conversation
between your application and the server. It allows you to compare how different
components sets access the server differently.
With the BDE SQL Monitor I have several favorite options I use:
Connect/Disconnect - this option shows when connections are made to the database.
You can use this option to determine if you have more than one connection running
to the database or if you are opening and closing connections to the database
frequently.
Prepare Query Statement - this option allows you to see how often your queries
are being prepared. In a properly built application queries should be prepared
once and use multiple times to minimize traffic between the client and server.
Execute Query Statement - this option allows you to see the SQL query that
is being sent to the server.
Statement Operations - this option allows you to see records being retrieved
from the server. Each "Fetch " statement in the SQL Monitor represents one row
being retrieved from the server.
Transactions - this option allows you to see how often you are committing your
data to the database.
Invest some time in the SQL Monitor and it will give you an excellent education
on how Delphi and CBuilder interact with InterBase, which will allow you to
build highly optimized applications
3) BDE vs Native Access Components like IBX, the Speed
Difference is Only 40%
One way to pick up performance in your application is to use native access
components such as IBX. Over all, what you will see is approximately a 40% increase
in performance. In addition you will pick up better access to some of the internal
features of InterBase.
However, what you gain in speed and accessibility you lose in terms of portability.
If you're planning to make your application work with InterBase and other SQL
servers, then using native data access components will increase your conversion
costs significantly. Please let me remind you that the "S" in SQL stands for
structured not standard. Even if you used the BDE for your application and you
wish to support Oracle , you will most likely have to rewrite all your SQL statements
to get acceptable performance from Oracle.
4) Avoid Holding Transactions Open for Long Periods
of Time.
Quite often in the newsgroups you will hear comments to the effective don't
hold transactions open for a long period of time. Is never qualified as to what
a long period of time is. The duration of the transaction isn't necessarily
as important as to what is done with the transaction. If you're going to hold
a transaction open for a day and do nothing with it, that is not nearly as severe
as holding transactions open all day and running a query against a every minute
and a half. The former will prevent InterBase from doing garbage collection.
The latter will also prevent InterBase from doing garbage collection and in
addition will cause InterBase to consume more and more memory to keep track
of what is going on in the transaction. The latter can cause significant slowdowns
in performance. For example one company would open their application in the
morning and InterBase would be consuming roughly 180 to 200 megs of memory on
the server. Because transactions were being held open all day and select queries
being run against them, by the end of the day, InterBase was consuming 950 megs
of memory. The difference in performance was readily visible when the backups
were run. With InterBase consuming 950 megs of memory the backup would take
an hour to run. If you shut down InterBase and restarted it to free all the
memory, the backup would take six minutes.
So, holding transactions open for several minutes isn't a major problem. Holding
a transaction open all day and running lots of queries against that transaction
can cause problems.
5) Do Not Use Large Varchars
InterBase stores chars and varchars internally essentially the same way. Externally,
chars are returned to your Delphi application padded with blanks to the length
of the field. On the other hand, varchars are presented to your application
with no trailing blanks. What most people don't realize is a InterBase passes
chars and varchars over the network the same way, fully padded out. To help
you visualize this, think of a varchar(32000) with just the letter 'a' in the
field. When this field is passed over the network it will be passed as the letter
'a' and 31,999 filler characters.
When you are designing your database remember that chars and varchars are passed
between the client server fully padded out. If you wish to have a long variable
length string consider using a blob.
6) Always Build Your Front End Application Using a
Remote Connection
There are two types of connections that you can make, local and remote. A local
connection string looks like:
Connect c:pathtomydatabase.gdb
It uses memory mapped files to do the communication between your application
and the server and is deceptively fast. This type of connection is significantly
faster than what you can expect when you use a remote server.
A remote connection looks like:
Connect localhost:c:pathtomydatabase.gdb
Or
Connect myserver:c:pathtomydatabase.gdb
A remote connection goes through the network to gain access to InterBase and
gives performance that one would expect from a client/server application.
The extra speed that a local connection has, easily hides serious performance/design
problems. These problems will surface when you finally connect to a remote server.
On the newsgroups the message that is posted by someone who built there application
with a local connection and has this problem goes something like:
I developed my application using local InterBase and it ran great. When I move
to a remote InterBase Server it runs horribly, what is the problem.
7) Use a 2Kb or 4Kb Page size for Your Database
The default page size in InterBase 5.x is 1Kb which is too small. Memory is
cheap and so is disk space, so you should increase the page size to 2Kb or 4Kb.
I use 4Kb by default and 8Kb on systems with large tables. The larger page size
results in reduced index depth and the ability to hold larger blobs and records
on a single page all of which will increase performance, especially the indexes
if you have long tables.
The only way to change the page size of an existing database is to back it
up and restore it with a new page size.
8) The Only Way to Set Your Database Cache Buffers
is Using Gfix
There are lots of ways to set the number of cache buffers on an InterBase server.
The best way is to set number of cache buffer allocated on a per database basis.
There is no way to set this value using a GUI interface. You must edit the ibconfig
file by hand or use gfix. The command to set the buffers is :
Gfix buffers 10000 user sysdba password masterkey pathtomydatabase.gdb
This will set the cache buffers for mydatabase.gdb to 10,000 database pages.
This means that the first connection to this database will cause InterBase to
add 10,000 database pages of memory to its cache. The second and subsequent
users will add zero buffers.
The range for buffers is between 0 and 64k. Empirical testing that I have done
shows a negligible performance increase going from 10,000 buffers to 20,000
so I typically use 10,000.
9) Do Not Parameterize Queries That Contains the Verb
"Like".
The verb "like" is somewhat tricky to use correctly. Assuming you have an index
on the lastname field, the following SQL will cause InterBase to use the index:
Select * from customer where lastname like 'SCH%'; // find all the customers
who's last name begins with 'SCH'
However, the following SQL will be slow and cause excessive load on the InterBase
server as it will not use an index:
Select * from customer where lastname like '%SCH%; // find all the customers
who's last name contains 'SCH'
The only way InterBase can solve the above SQL statement is to do a table scan.
This means that InterBase will retrieve every record from disk for the customer
table, examine it to see if it the lastname field contains an 'SCH'. Table scans
are slow and should be avoided.!
The following queries will cause table scans:
Select * from customer where lastname like :aparam ;
This causes a table scan because InterBase doesn't know if you are going to
send "SCH%" or a "%SCH%" as the parameter, so it optimized for the lowest common
denominator which results in a table scan.
Select * from customer where lastname like upper('something') ;
It doesn't matter what you submit to the function or what the function is,
InterBase cannot tell what the output of the function is ahead of time so it
optimizes this query down to a table scan
Select * from customer where upper(lastname) like '%SCH';
The only way for InterBase to determine the output of the Upper(lastname) function
is to run the lastname field through the function which means a table scan will
happen.
Generally, table scans are slow, indexes are fast. Use "like" carefully and
you application will fly.
10) Primary Keys and Foreign keys, I Don't Use Them
InterBase has the ability to declare primary keys and foreign key in your table
definitions. While the declarative aspects of these feature make them easy to
use, I don't use either.
For primary keys, I use a unique index. This allows me to name the index so
when I try to post a duplicate value to the table I get an exception with an
Index name I assigned instead of rdb$primary?? which is not intuitive. In addition
unique indexes can be deactivated and reactivated to rebuild the indexes where
primary keys cannot.
Foreign keys are a different story. Lets assume you have a table with a field
called "Address_State" in it and you want to limit what is accepted in that
field to one of the 50 States, so you create a foreign key to a table called
States. When you make the foreign key declaration, InterBase will create an
index on the "Address_State" field. The purpose of this index is for when you
try to delete a row from the States table. It will allow InterBase to quickly
check the "Address_State" field to ensure the value contained in the States
table record that you are trying to delete, is not present in the Address_state
field.
The performance problem is with the in index that is created on the "Address_State"
field. If you have 1,000,000 records in your table and there 50 states then
there are roughly 20,000 records per state in the "Address_State" field index,
making it a very very poor index. If the InterBase optimizer gets a hold of
it this index, there can be problems.
One site that had this problem was running a report that never finished. An
examination of the problem revealed that one column had 95,000 rows with the
number 3 in it and 5,000 nulls. There was an index on this field created by
a foreign key. The InterBase optimizer found this index and the index caused
the optimize to solve the problem backwards. The result was an report/query
that could not finish in 10 hours. By dropping the foreign key and its index
and replacing it with triggers, the optimizer solve the problem in a couple
of minutes and the reports came out.
11) Parameterize and Prepare Your Queries for Maximum
Performance.
Every SQL statement that is sent to InterBase is prepared, either by you or
by the tools you use. Preparing a query is a reasonably expensive process, especially
over slow speed lines or the Internet. For example, in one project which ran
over 28.8kb dial up lines, the first child took 30 seconds to be returned because
of the prepares that had to be done, while subsequent children took 4 seconds
each.
In the case of the BDE, if you don't prepare your statements, the BDE will
prepare it for you when you open your TQuery Unfortunately it will also unprepare
your statement when you close the TQuery, which defeats the purpose of preparing
the query. However, if you prepare your TQuery (TQuery.prepare), it will stay
prepared until you explicitly unprepare it or you change the contents of the
SQL property.
For you IBX fans, IBX will prepare your queries for you automatically and will
keep them prepared so long as you do not change the SQL property.
12) Use Single Processor Machines with InterBase
InterBase (SuperServer Architecture) does not know how to use multiple processors
correctly. On NT, if you have multiple processors, the InterBase process will
get flipped from processor to processor. It will look like InterBase is using
multiple processors but it is not. Empirical evidence points to as much as a
30% decrease in performance due to the flipping of the InterBase process.
If you have a multi processor machine and are running InterBase on it, you
can run InterBase as an application and set its affinity to one processor to
prevent the flipping of InterBase from processor to processor.
13) Rob's Rules of Left Outer Joins
I have never been a fan of left outer joins. They tend to be slow and InterBase
will only use an index on the the first left outer join in a query. I have several
ways of avoiding left outer joins.
For the rest of the discussion, here is some sample data
SQL> select * from children;
CHILD_KEY NAME SCHOOL_KEY
=========== ============================== ===========
1 Robert Schieck 0
2 Jon Schieck 1
3 Diane Schieck 2
4 Megan Schieck 3
5 Robyn Schieck 4
6 Emma Schieck 5
SQL> select * from schools;
SCHOOL_KEY SCHOOL_NAME
=========== ================================
1
2 Sir Winston Churchill
3 Oakridges Public School
4 Lady Churchill Senior Public
I have four rules for left outer joins to share with you :
1) Design out the Left Outer Join
We have a table(children) with children in it and a field called School_key which
is a foreign key (no not declarative) into the Schools table which has two fields
School_Key and School_Name; Since children under 5 do not go to school, they would
have a null value in their School_key field and you would have to do a left outer
join to get all the children with their associated school_name. In my school table
there is a record with the following values:
School_key 0
School_Name ''
Yes the school_name field is blank or an empty string. In the children table,
the School_key field is defaulted to 0. Now all the children have a school. Just
the ones under 5 years of age have a school with no name. When I need to see all
the children with their associated schools I can use an inner join instead of
an outer join.
SQL> select c.name, s.school_name from children c, schools s where c.school_key = s.school_key;
NAME SCHOOL_NAME
============================== ================================
Jon Schieck
Diane Schieck Sir Winston Churchill
Megan Schieck Oakridges Public School
Robyn Schieck Lady Churchill Senior Public
Please notice that Jon Schieck doesn't go to a school.
2) Use a Correlated Sub Query
Not all SQL servers can do this ( I won't mention the MS SQL Server 6.5 can do
one of these but two of them in the same query will cause wrong answers, not an
error message, just wrong answers). Using the above example the query to retrieve
ALL the children and their associate schools would look like:
Select c.Name, (select s.school_name from schools s where s.school_key = c.school_key)
from children c;
This query will run the subquery once for each row returned.
SQL> Select c.Name, (select s.school_name from schools s where s.school_key = c.school_key) from children c;
NAME
============================== ================================
Robert Schieck
Jon Schieck
Diane Schieck Sir Winston Churchill
Megan Schieck Oakridges Public School
Robyn Schieck Lady Churchill Senior Public
Emma Schieck
The two nulls show up because they have a school_key but no corresponding entry
in the schools table.
3) Use a Stored Procedure
Stored procedures that return a result set are very powerful. In this case they
allow you to create a virtual left outer join:
create procedure select_childSchool
returns (name varchar(30), school_name varchar(30))
as
declare variable school_key integer;
begin
for select name, school_key from children into :name, :school_key do
begin
school_name = null;
select school_name from schools where school_key = :school_key into :school_name ;
suspend;
end
end
The output looks like :
SQL> select * from select_childschool;
NAME SCHOOL_NAME
============================== ==============================
Robert Schieck
Jon Schieck
Diane Schieck Sir Winston Churchill
Megan Schieck Oakridges Public School
Robyn Schieck Lady Churchill Senior Public
Emma Schieck
4) Use a left outer join!
Sometimes there is no substitute for a left outer join!
SQL> select c.name, s.school_name from children c left join schools s on C.SCHOOL_KEY = S.SCHOOL_KEY;
NAME SCHOOL_NAME
============================== ================================
Robert Schieck
Jon Schieck
Diane Schieck Sir Winston Churchill
Megan Schieck Oakridges Public School
Robyn Schieck Lady Churchill Senior Public
Emma Schieck
14) Avoid Fetchalls Like the Plague
A fetchall is when you do something in Delphi or CBuilder that causes your
application to fetch all the records for a select statement from InterBase.
For example if you have 100,000 people in a database and do a "select * from
people where lastname like 'A%'" you will be selecting approximately 10,000
of those records.
If you issue the select from a TQuery with nothing connected to it, then only
one row will be returned from the query until you call the next method for the
TQuery.
If you have a grid attached to the TQuery when you open it, then it will fetch
one row for each row that shows up in the grid. If you have 14 rows showing
in the grid, then it will fetch only 14 rows from the server. It will fetch
more rows from the server as need when you scroll past the bottom of the grid.
If you do a TQuery.recordcount, no mater what you have connected to the TQuery,
the BDE will fetch each row from the server and count it. When the last row
has been sent , in this case row 10,000, then the BDE will give you the number
of records that it fetched from the server. Fetching 10,000 records can take
a while.
A filter on a TQuery, causes a fetchall and then it applies the filter to all
the records.
A locate on TQuery causes a fetchall and then it positions the record pointer
to the appropriate record.
If you use the SQL monitor, you can see the fetches from the server and it
is very easy to spot a "fetchall" happening.
15) Large System, Lots of Users, Cache Your Lookup
Tables for Speed.
As the number of users on your system increase, you need to decrease the load
on your server. One way to do that is to start caching your lookup tables. For
example, there are 50 states in the United States and the odds of a new one
being added on any given day is quite remote, so it is a good candidate to be
cached.
A ClientDataSet is an excellent component for caching the 'states" table. With
the "States" table in a clientdataset, you no longer need to join the "States"
table in your queries on the InterBase server but can simulate the join using
a calculated field on the client side.
This will reduce the load on the server and allow you to service more users.
16) Need Speed, Turn Async Writes Off But Beware
of the Risks.
By default InterBase 5.x on NT, forced write or sync writes is on. This means
when InterBase tells NT to write something to disk, NT is not allowed to put
it into the OS cache, it must write it to disk immediately.
On Unix platforms, the default is to have force writes off. This means when
InterBase tells the the UNIX operating system to write something to disk, UNIX
puts it into its cache and writes it out to disk when it is ready.
What is the difference in performance? Inserting 13,000 records via a stored
procedure into a table, with forced writes on it took 15 minutes, with forced
writes off it took 45 seconds.
As with everything there is a price to be paid and in this case the speed comes
with risk. If the InterBase server or the Operating system crashes while InterBase
is writing to disk, you stand a good chance of corrupting you database, potentially
to the point of making it unrepairable.
If you choose to turn off force writes to improve performance, make sure you
computer is on a UPS and make lots of backups.
Summary
Hopefully some of the techniques present here will help your Delphi/CBuilder/InterBase
Systems perform better.
Connect with Us