InterBaseExpress: Tips and Tricks

By: Conference Speaker

Abstract: InterBase Express (IBX) is a set of data access components that provide a means of building applications with Delphi 5/6 and C++ Builder that can access, administer, monitor, and run the InterBase Services on InterBase databases.

This is the technical paper from a talk given at the 12th Annual Borland Developer's Conference
By Jeff Overcash

Jeff Overcash graduated from Georgia Tech in 1992. After developing an equipment control system purchased by TransAmerica Leasing, he has for the past four years been doing independent consulting using Delphi, C++Builder, InterBase, and Oracle. Jeff develops the InterBase Express components (available for Delphi, C++Builder, and Kylix) for Borland. Jeff has also been member of TeamB for several years.

Note: No portion of this paper may be distributed or reproduced by any means, or in any form, without the author's and/or Borland's prior written permission.


Introduction

Overview of components and when to use

Tuning tips

Unidirectional

Buffer Chunks

CHAR vs. VarChar vs. Blob

Use the SQL Monitor

Summary


Introduction

InterBase Express (IBX) is a set of data access components that provide a means of building applications with Delphi 5/6 and C++ Builder that can access, administer, monitor, and run the InterBase Services on InterBase databases. Delphi 5 and C++ Builder users can use the IBX components to build InterBase database applications without the overhead of using the Borland Database Engine (BDE). IBX accesses the InterBase API directly, allowing increased speed and control within InterBase applications. IBX started life as Free InterBase (FIB) by Gregory Dietz.

When Borland decided to base IBX on FIB, Ravi Kumar did the initial IBX release and the IB 6.0 release. Ravi has sense moved on to JBuilder and in April of 2000 I took over development of IBX with the full support of both the InterBase and Delphi R&D teams. In May of 2001 IBX for Kylix was released making IBX is the only native InterBase database access for D5, Kylix and D6 at this time.

Some text here to introduce the topic and give a quick overview of what is to be covered in the paper.

Overview of components and when to use

IBSQL - This is a non data aware control. It is unidirectional only (no going backwards). This is easily the fastest of all the IBX components. While it can't be used with data aware controls, it can be used in network tight bandwidth areas (like a dial-up connection) and you handle the transfer of the data to and from non data aware controls. Mainly though it will be using in applications where you are doing batch processing or need quick lookups.

Another good use is when you need a quick lookup based on values to build custom text for reports. Generally its low memory overhead and over all speed improvement any time that you do not need to attach to a data aware control is a good place for an IBSQL component.

IBDataset - This is the main data aware control I recommend using. It has advantages of having all the Live mode SQL statements (InsertSQL, DeleteSQL, ModifySQL and RefreshSQL) built in. It can still use an UpdateObject for more complicated update sequences. The major difference between an IBDataset and an IBQuery is the use of the native IBXSQLVAR instead of TParam. This has the advantage of being able to handle INT64 data types under IB 6.0, but has a disadvantage of the Params not accessible at design time. Expect future versions to be able to publish the Params property.

For MIDAS work IBDataset can handle all MIDAS requests except for calls to SetParams and GetParams as IBDataset does not use TParams. This is also a feature that will be added in the future, but for now if you need to remotely set or retrieve param values then you will need to use the TIBQuery or TIBTable components.

IBQuery - TIBQuery is the component that more directly maps to the TQuery component from the BDE. It uses TParam and TField objects (which means that under Delphi 5 they are limited to not be able to handle INT64 data types correctly). By default this is a read only component, to make it a live query you need to use a TIBUpdateSQL component.

This component works fully with MIDAS and has the advantage of being able to have its params set at design time. If you use report writers like ReportBuilder that has a design time preview mode you will probably find that using IBQuery to drive the report useful since you can see the results at design time.

IBQuery is a read only component by default. To make the component Read Write an IBUpdateSQL object needs to be tied to the IBQuery through the UpdateObject property. One main difference between IBQuery in this configuration and IBDataset is the Edit functionality. When posting an edit IBUpdateSQL will raise an exception if you do not modify one and only one record. IBDataset will allow you to update 0 or more records without any exceptions.

Performance wise there is little difference between TIBDataset and TIBQuery.

IBTable - This component is similar to the BDE's TTable. The main problem I have with this component is that the Table paradigm is not a good one for Client Server databases. The Table paradigm assumes that all the data has to come local because the engine is local. With Client Server databases it is better to let the InterBase engine do the work and display only the results to the user. You also should always be returning only the columns you need and not all the columns.

There are not too many places I would recommend the TIBTable component. One place is when you are putting together a quick prototype. You don't have to write SQL and you are not concerned with performance. Really small datasets can be a potential use for TIBTable.

One place I've seen people misuse TIBTable is using them with DBLookupComboBox. Usually the table driving the lookup items has more columns that you need on the client side. When that is the case TIBQuery is a better choice.

When your underlying database is a dialect 3 database be aware that your case of the table must match exactly the case in the database. At design time this is not an issue, but it is at run time if you are assigning table names in code.

TIBTable has no known issues with TClientDataset.

IBStoredProc - This component mainly just makes using a stored procedure easier. It is designed to work with stored procedures that do not return result sets. Either no return values or a singleton return value only. Even if you do use stored procedures in your database this component is not necessary. You will probably find that using an IBSQL or IBDataset easier because those are the components you use all the time.

Tuning tips

One of the deficiencies of IBX is its memory model. Most performance issues can be directly traced to the memory model. In many cases though there are little tuning tricks you can do to improve performance.

First lets look at the memory model and why it is not overly efficient. Basically IBX uses a flat contiguous memory space to hold its local copy of the data. As you read or insert more records that memory space will grow over time as needed. This seems to be very straight forward and it is, but it leads to some complications. Here is a normal series of events as you work with a TIBCustomDataset descendant.

First the dataset is opened. This will cause the dataset to allocate enough memory to hold 1000 records (1000 is the default buffer size). Each record "slot" will be the size of the record in bytes.

Base memory image

Do some work reading and inserting records

Almost full for one chunck image

Once the 1000 record buffer is full then the event (either an insert of a new record or reading the 1001st record) IBX then has to allocate more space to hold these records. What ends up happening is that a call to ReallocMem will be made to allocate enough space for another 1000 records. ReallocMem will first allocate enough space for 2000 records, then it will copy the first 1000 records from the old memory space to the new one and finally free the memory that the first 1000 records held on to. At this time IBX can read/insert the 1001st record.

Memory reallocation image

ReallocMem is actually a very expensive method both in terms of memory needed and time required. On very large result sets the ReallocMem phase will usually have to hit the paging space of Windows. This is when everything slows down to a crawl as the original memory gets paged to disk so there is enough RAM to allocate the new memory space and then paging comes into effect again when the copy of the old data to the new memory space happens.

Looking at this it starts to become evident why the decision to not insert in place, but always append was made. There are two problems with this memory model and inserting in place. The first is that the slot where the record has to go has to be "freed" up. To free it up there will have to be a memory content move by one record slot. This will also affect all the bookmarks and they will have to all be adjusted to the new offsets (in IBX Bookmarks are just integer offsets into this single large contiguous memory space). This process can be slow and would also require a different bookmarking scheme altogether if Insert in place was implemented. Finally if the insert is canceled the reverse process to close up the unused slot should be done (but could be skipped and treated like a deleted record and just waste the slot).

So the main performance drawbacks to this type of model are

  • Very memory intensive.
  • Slows up over time as the dataset gets larger. Each reallocation will take slightly longer than the one before with a significant drop in performance once the page file comes into play.
  • Insert in place is expensive on anything but very small datasets.
  • Not mentioned above, but if you use CachedUpdates each record gets its own cache slot basically doubling the memory overhead.

So how can IBX be tuned to get the best performance with these limitations?

Unidirectional

Unidirectional datasets can solve the memory intensive problem. When a dataset is placed in unidirectional mode enough memory is allocated to hold only 2 records. That memory is never reallocated, the slots are just reused.

When in unidirectional mode you get a one way traversal through the dataset (you do have the option of a single previous call only). Because of this if you are attaching data aware controls you usually will not be able to use unidirectional := true. Most data aware controls like DBLookupComboBox, DBGrids, DBNavigators etc. assume the availability of a bi-directional dataset (bi-directional means you can go forwards and backwards through the data). For instance if you have unidirectional to true and attach it to a DBLookupComboBox you will see two records repeated over and over in the list.

Unidirectional datasets are valuable when you plan on doing some batch operations like a batch insert. Since this case usually means you never need to go backwards this will greatly improve both speed and memory overhead.

Also when you are attaching a IBX dataset to a ClientDataset you can always do unidirectional work. As a matter of fact, you are doubling the memory consumption if you keep a copy of the data in the IBX dataset and another copy in the ClientDataset.

Demo - Show the difference time wise of inserting 50,000 1K records with unidirectional both true and false.

Buffer Chunks

As discussed earlier, internally IBX stores the records in a flat memory space. BufferChunk indicates how many records to grow whenever growth is required. It does not determine how many records to retrieve at a time (a common misperception).

BufferChunks is there for one primary reason. Doing a lot of reallocations gets slower and slower as the memory space gets larger. So you can increase the buffer chunks to reduce the number of times you will have to reallocate memory at the cost of more 'wasted' memory space (I put wasted in quotes because if you have a fairly static size for the result set you can tune it close). For instance if you have 1,230,010 records you need to retrieve for a report. With BufferChunk of 1000 you will have to reallocate 1,230 times and will have wasted space of 990 * record size. If you up the BufferChunk to 100,000 you will only have to reallocate 12 times but your wasted space will be 999930 * record size which can be very large. The second would be much faster, but at the cost of memory. The first is much slower but requires much less memory.

Demo - compare the difference time/memory space retrieving ~1,000,000 records with different buffer sizes.

CHAR vs. VarChar vs. Blob

This one is not really an IBX but an IB tip. InterBase versions 6.0 and earlier always pass the maximum size that a record can be across the net. I've seen people have columns like VarChar(32000). The problem with such columns is that when you retrieve a single record with that type of column size it will require a minimum of 8 4K packets to bring it across the network because IB will pad out the VarChar column to 32000 even if all you have in it is a single letter.

Blobs work differently and only the Blob ID is passed and the Blob retrieval is delayed until you try to access it for the first time in IBX. Of course the problems with switching to Blobs are that Blobs can't be manipulated in stored procedures or triggers directly, but if you don't require that Blobs are the better way to go with larger Text columns from a performance standpoint.

This also points out the advantage of only retrieving the columns you need. Always be suspicious of 'Select * from <TABLE>' type of SQL statements because you rarely need all the columns. IB will pack each package with the most data it can. So if you have a record whose maximum size if 400 bytes and you fetch a record, IB will return ~10 records in the 4K packet that is sent back. The next 9 records don't do a network call, but is fed from GDS32. On the 10th record another fetch is done. But if your SQL statements isn't using 3 columns and by eliminating those three columns the max. size comes down to 230 bytes IB will be able to pack 17 records per network fetch. Even on faster networks network traffic is a limiting factor in how much data can be retrieved per second.

How bad can the performance drop be? Here are some timings. What I did was for each Load rate (500, 1000, 2000, etc.) I ran two timings. The first was how long to do 1000 inserts the second how long to do 1000 selects. I also did the same with a table that had as its only column a VarChar of the length that the test was for. This was to roughly simulate the timings if only the needed data was passed.

All the source (plus the Project2 is a Linux version writing with Kylix) is included in the zip.

What was found was that the VarChar(32000) table started at 3295ms and stayed roughly at the same spot for inserts/selects all the way through (as expected). We see inserts at 60% performance increase and selects at a 80+% increase until we get over 4K, then normal decreases as each size that required another packet to get the data through.

The Windows to Linux had an amazing jump once 4K level was passed. Also the Linux version of the program had problems at the 2K level, and I'm not certain why.

Test machines

Windows client - PIII 750 w/ 384

Windows server - PIII 600 w/ 256 (wide SCSI)

Linux Server - PIII 600 w/128 running SuSE 7.1 and the Jan build.

 Windows numbers  Linux Numbers

 Insert 500 on VarChar(32000) for 1000 rows = 3295ms
 Insert 500 on VarChar(500) for 1000 rows = 1052ms
 Select 500 on VarChar(32000) for 1000 rows = 3625ms
 Select 500 on VarChar(500) for 1000 rows = 90ms
 
 Insert 1000 on VarChar(32000) for 1000 rows = 3355ms
 Insert 1000 on VarChar(1000) for 1000 rows = 1182ms
 Select 1000 on VarChar(32000) for 1000 rows = 3635ms
 Select 1000 on VarChar(1000) for 1000 rows = 150ms
 
 Insert 2000 on VarChar(32000) for 1000 rows = 3595ms
 Insert 2000 on VarChar(2000) for 1000 rows = 1642ms
 Select 2000 on VarChar(32000) for 1000 rows = 3586ms
 Select 2000 on VarChar(2000) for 1000 rows = 270ms
 
 Insert 4000 on VarChar(32000) for 1000 rows = 3855ms
 Insert 4000 on VarChar(4000) for 1000 rows = 1973ms
 Select 4000 on VarChar(32000) for 1000 rows = 3665ms
 Select 4000 on VarChar(4000) for 1000 rows = 541ms
 
 Insert 8000 on VarChar(32000) for 1000 rows = 4927ms
 Insert 8000 on VarChar(8000) for 1000 rows = 3876ms
  Select 8000 on VarChar(32000) for 1000 rows = 7481ms
 Select 8000 on VarChar(8000) for 1000 rows = 2013ms
 
 Insert 16000 on VarChar(32000) for 1000 rows = 5337ms
 Insert 16000 on VarChar(16000) for 1000 rows = 5518ms
 Select 16000 on VarChar(32000) for 1000 rows = 3856ms
 Select 16000 on VarChar(16000) for 1000 rows = 2072ms
 
 Insert 24000 on VarChar(32000) for 1000 rows = 6940ms
 Insert 24000 on VarChar(24000) for 1000 rows = 7170ms
 Select 24000 on VarChar(32000) for 1000 rows = 3976ms
 Select 24000 on VarChar(24000) for 1000 rows = 3124ms
 
 Insert 32000 on VarChar(32000) for 1000 rows = 8612ms
 Insert 32000 on VarChar(32000) for 1000 rows = 9344ms
 Select 32000 on VarChar(32000) for 1000 rows = 4136ms
 Select 32000 on VarChar(32000) for 1000 rows = 4126ms
 Insert 500 on VarChar(32000) for 1000 rows = 1462ms
 Insert 500 on VarChar(500) for 1000 rows = 942ms
  Select 500 on VarChar(32000) for 1000 rows = 681ms
  Select 500 on VarChar(500) for 1000 rows = 440ms
 
 Insert 1000 on VarChar(32000) for 1000 rows = 1913ms
 Insert 1000 on VarChar(1000) for 1000 rows = 1402ms
 Select 1000 on VarChar(32000) for 1000 rows = 1181ms
 Select 1000 on VarChar(1000) for 1000 rows = 812ms
 
 Insert 2000 on VarChar(32000) for 1000 rows = 2944ms
 Insert 2000 on VarChar(2000) for 1000 rows = 2944ms
 Select 2000 on VarChar(32000) for 1000 rows = 1743ms
 Select 2000 on VarChar(2000) for 1000 rows = 1582ms
 
 Insert 4000 on VarChar(32000) for 1000 rows = 4406ms
 Insert 4000 on VarChar(4000) for 1000 rows = 4006ms
 Select 4000 on VarChar(32000) for 1000 rows = 3305ms
 Select 4000 on VarChar(4000) for 1000 rows = 3154ms
 
 Insert 8000 on VarChar(32000) for 1000 rows = 7651ms
 Insert 8000 on VarChar(8000) for 1000 rows = 7551ms
 Select 8000 on VarChar(32000) for 1000 rows = 7801ms
 Select 8000 on VarChar(8000) for 1000 rows = 6269ms
 
 Insert 16000 on VarChar(32000) for 1000 rows = 93505ms
 Insert 16000 on VarChar(16000) for 1000 rows = 94165ms
 Select 16000 on VarChar(32000) for 1000 rows = 13149ms
 Select 16000 on VarChar(16000) for 1000 rows = 13149ms
 
 Insert 24000 on VarChar(32000) for 1000 rows = 146832ms
 Insert 24000 on VarChar(24000) for 1000 rows = 160440ms
 Select 24000 on VarChar(32000) for 1000 rows = 20629ms
 Select 24000 on VarChar(24000) for 1000 rows = 19839ms
 
 Insert 32000 on VarChar(32000) for 1000 rows = 203953ms
 Insert 32000 on VarChar(32000) for 1000 rows = 199667ms
 Select 32000 on VarChar(32000) for 1000 rows = 54339ms
 Select 32000 on VarChar(32000) for 1000 rows = 106493ms

So as you can see there is a huge difference in performance when your average size is below 50% of the Max load.

Use the SQL Monitor

It is sometimes difficult to track down performance or semantic problems in client server environments. The IBX SQL Monitor allows you to get a little deeper into what is really being sent to the server from your application. It also gives rough timing so you can use it as a minor profiler. It is not meant to be a full fledged profiler as it does not take into consideration what is happening in your code, only when network calls are happening.

Sql Monitor image

What kind of information can you get out of the SQL Monitor? The SQL Monitor allows you to see things like the Plan for a statement, the parameters passed in a statement, fetches made, service API calls and transaction control. This low level information can be gleamed while debugging ,but you will find it faster to use a SQL Monitor in most cases. In particular help is the ability to see the parameters that are being sent for a statement.

Setting up an application to be traced is very easy and can be done either at design time or in code at run time. To start tracing you need turn on the trace flags that you want to track (see the online help for TIBDatabase - TraceFlags property). Once the senders know what to send, you can drop a IBSQLMonitor on a form (or use a standalone application like the SQL Monitor demo I've included) and set an OnSQL event handler.

Tracing can be controlled at multiple levels. Each connection type object (TIBDatabase and TIBService) can control what messages it will send. The MonitorHook can control globally what messages an application will send (by default it passes on all messages). Finally the IBSQLMonitor component can filter what messages get passed on to the OnSQL event.

Note that the SQL Monitoring code actually uses two helper threads - a Reader thread and a Writer thread. When no IBSQLMonitor components exist in your application there will not be a Reader thread. The writer thread is not created until the first tracing message goes out that passes all the filters. The Writer thread is closed down by either a call to DisableMonitoring or during the finalization section during application shutdown. If you have a DLL that is being dynamically loaded and unloaded (like an ISAPI DLL) it is important that this writer thread is terminated (if started) before the DLL is unloaded. Failure to do this will result in a freezing of the unload process as DLL waits for the writer thread to terminate (this is a limitation of Windows).

Summary

IBX has come a ways since its initial release with Delphi 5. It is more stable, has more user features and is becoming one of the most popular component sets to access InterBase. The plans for the future are aggressive and should make working with InterBase even more easier than it is today. Borland has a couple newsgroups setup to help people with IBX - borland.public.delphi.database.interbaseexpress, borland.public.cppbuilder.database.interbaseexpress. I and other TeamB members frequent these groups to try and answer any questions you have on IBX. IBX updates normally will appear first on my CodeCentral author page and later on the InterBase page when I feel that no major issues were introduced in the latest update. Additional demos will also appear on my author page so it isn't bad to check it every couple of weeks.

Server Response from: ETNASC03