A ClientDataSet in Every Database Application

By: Cary Jensen

Abstract: This article is the first in an extended series designed to explore the ClientDataSet. The basic behavior of the ClientDataSet is described, and an argument is made for the extensive use of ClientDataSets in most all database applications.

The ClientDataSet is a component that holds data in an in-memory table. Until recently, it was only available in the Enterprise editions of Delphi and C++ Builder. Now, however, it is available in the professional editions of these products, as well as Kylix. This article is the first in an extended series designed to explore the capabilities and features of the ClientDataSet. 

I have been playing with an idea for a while, and I wanted the title of this article to reflect this (with my apologies to Herbert Hoover for the pathetic turn of his political promise of "two chickens in every pot and a car in every garage"). In short, I believe that a very strong argument can be made for including one ClientDataSet and a corresponding DataSetProvider for each TDataSet used in an application. Doing so provides your user interface and runtime code with a consistent set of features (filters, ranges, searches, and so forth) regardless of the data access technology being employed.

Actually I have two goals in this first of many articles detailing the ClientDataSet. The first is to set forth the reasons why I believe that ClientDataSets should play a primary role in most database applications. The second goal, and the one that I hope you find useful whether or not you accept my arguments, is to provide a general introduction to the nature and features of the ClientDataSet.

It's this second goal that I will address first. Specifically, in order for my arguments to make sense, it is essential to first provide an overview of the ClientDataSet, and how it interacts with a DataSetProvider. This discussion will also serve as a primer for many of the technique-specific articles that will follow in this series. After this introduction I will return to my first premise, explaining in detail how you can improve your applications through the thoughtful use of ClientDataSets.

Introduction to the ClientDataSet

 The ClientDataSet has been around for a while: Since Delphi 3 to be precise. But up until recently it has only been available in the Client/Server or Enterprise editions of Delphi and C++ Builder. In these editions the ClientDataSet was intended to hold data in a DataSnap (formerly called MIDAS) client application. While many Enterprise edition developers did make extensive use of the ClientDataSet's features in non-DataSnap application, that this component did not exist in the Profession edition products made recommending its widespread employment unrealistic.

With Borland's introduction of dbExpress, which first appeared in Kylix 1.0, the ClientDataSet, and its companion, the DataSetProvider, are now part of the Borland's Professional Edition RAD (rapid application development) products, including Delphi 6, Kylix 2, and C++ Builder 6. Now all Borland RAD developers have access to this powerful and flexible component (I'm not counting the Personal or Open edition developers in this group, since those versions do not have the database-related components in the first place).

With this in mind, let's now take a closer look at how the ClientDataSet works.

The ClientDataSet is a TDataSet descendant that holds data in memory in a table-like structure consisting of rows (records) and columns (fields). Using the methods of the TDataSet class, a developer can navigate, sort, search, filter, and edit the data held in memory. Because these operations are performed on data stored in memory, they are very fast. For example, on a test machine with 512 MB of RAM running an 850 MHz Pentium 3, an index was build on an integer field containing random numbers of a 100,000 record table in just under one-half second. Once built, this index can be used to perform near instantaneous searches and set ranges on this indexed field.

The ClientDataSet actually contains two data stores. The first, named Data, contains the current view of the data in memory, including all changes to that data since it was loaded. For example, if a record was deleted from the dataset, that record is absent from Data. Likewise, records added to the ClientDataSet are visible in Data.

The second store, named Delta, represents the change log, and contains a record of those changes that have been made to Data. Specifically, for each record that was inserted or deleted from Data, there resides a corresponding record in Delta. For modified records it is slightly different. The change log contains two records for each record modified in Data. One of these is a duplicate of the record that was originally modified. The second contains the field-by-field changes made to the original record.

The change log serves two purposes. First, the information in the change log can be used to restore edits made to Data, so long as those changes have not yet been resolved to the underlying data source. By default, this change log is always maintained, meaning that in most applications the ClientDataSet is always caching updates.

The second role that the change log plays only applies to a ClientDataSet that is used in conjunction with a DataSetProvider. In this role, the change log provides sufficient detail to permit the mechanisms supported by the DataSetProvider to apply the logged changes to the dataset from which the data was loaded. This process begins when you explicitly call the ClientDataSets ApplyUpdates method.

When a ClientDataSet is used to read and write data directly from a file, a DataSetProvider is not used. In those cases, the change log is stored in this file each time you invoke the ClientDataSets SaveToFile method, and restored each time you call LoadFromFile (or if you open and close the ClientDataSet when the FileName contains the name of the file). The change log is only cleared in this scenario when you invoke MergeChangeLog or ClearChanges (this second method causes the changes to be lost).

There are quite a few differences between how you use a ClientDataSet depending on whether or not a DataSetProvider is employed. The following discussion focuses exclusively on the situation where a ClientDataSet points to a DataSetProvider with its ProviderName property. Using a ClientDataSet directly with files will be discussed in detail in a future article.

How a ClientDataSet and a DataSetProvider Interact

In order to use a ClientDataSet effectively you must understand how a ClientDataSet interacts with a DataSetProvider. To illustrate this interaction I have created a Delphi project named CDSLoadBehaviorDemo. The main form for this project is shown in the following figure. While I will describe what this project does, it is best if you download this project from Code Central and run it. That way you can observe first-hand the interaction.

Here is the basic setup. The ClientDataSet points to a DataSetProvider through its ProviderName property, and the DataSetProvider refers to a TDataSet descendant through its DataSet property. When you set the ClientDataSets Active property to True or invoke its Open method, the ClientDataSet makes a data packet request from the DataSetProvider. This provider then opens the dataset to which it points, goes to the first record, and then scans through the records until it reaches the end of the file. With each record it encounters the DataSetProvider encodes the data into a variant array. This variant array is sometimes referred to as the data packet. When the DataSetProvider is done scanning the records, it closes the dataset to which it points, and then passes the data packet to the ClientDataSet.

You can see this behavior in the CDSLoadBehaviorDemo project. The DBGrid on the right-hand side of the main form is connected to a data source that points to a TTable from which the DataSetProvider gets its data. When you select ClientDataSet | Load from this project's main menu, you will literally see the TTable's data being scanned in this DBGrid. Once the DataSetProvider gets to the last record of the TTable, the TTable is closed and this DBGrid appears empty again, as shown in the following figure.

Whether or not the scanning of the TTable is visible in the CDSLoadBehaviorDemo project is configurable. Visible scanning is the default in this project, but because this visible scanning requires so many screen repaints, the ClientDataSet takes quite a bit of time to load the not quite 1000 records of the Items.db table (the table pointed to by the TTable). If you select View | View Table Loading to uncheck this menu option, and select ClientDataSet | Load (if data is already loaded, you must first select ClientDataSet | Unload), you will notice that these records load almost instantly. The actual load time of a ClientDataSet depends on how much data is loaded.

Returning to a description of the ClientDataSet/DataSetProvider interaction, upon receiving the variant array, the ClientDataSet unpacks this data into memory. The structure of this dataset is based on metadata that the DataSetProvider encodes in the variant array. Even though the dataset to which the DataSetProvider pointed may contain one or more indexes, the data packet contains no index information. If you want indexes on the ClientDataSet, you must define or create them. ClientDataSet indexes can be defined at runtime using the IndexDefs property, and this topic will be discussed at length in a future article.

The ClientDataSet now behaves just like most any other opened TDataSet descendant. Its data can be navigated, filtered, edited, indexed, and so forth. As pointed out earlier, any edits made to the ClientDataSet will affect the contents of both the Data and Delta properties. In essence, these changes are cached, and are lost if the ClientDataSet is closed without specifically telling it save the changes. Changes are saved by invoking the ClientDataSet's ApplyChanges method.

Applying Changes to the Underlying Data Source

When you invoke ApplyChanges, the ClientDataSet passes Delta to the DataSetProvider. How the DataSetProvider applies the changes depends on how you have configured it. By default, the DataSetProvider will create an instance of the TSQLResolver class, and this class will generate SQL statements that will be executed against the underlying data source. Specifically, the SQLResolver will generate one SQL statement for each deleted, inserted, and modified record in the change log. Both the UpdateMode property of the DataSetProvider, as well as the ProviderFlags property of the TFields for the provider's dataset, dictate exactly how this SQL statement is formed. Configuring these properties will be discussed in a future article.

If the dataset to which the DataSetProvider points is an editable dataset, you can alternatively set the provider's ResolveToDataSet property to True. With this configuration, a SQLResolver is not used. Instead, the DataSetProvider will edit the dataset to which it points directly. For example, the DataSetProvider will locate and delete each record marked for deletion in the change log, and locate and change each record marked modified in the change log.

If you download the CDSLoadBehaviorDemo project, you can see this for yourself. From your designer, select DataSetProvider1 and set its ResolveToDataSet property to True. Next, run the project and load the ClientDataSet. After making several changes to the data, select File | ApplyUpdates. Depending on the speed of your computer, you may or may not actually see the DBGrid become active as the TTable is edited. However, on most systems you will notice the DBNavigator buttons become active briefly as a result of the editing process. (If your computer is too fast, and you cannot see the DBGrid or the DBNavigator become active, you can assign an event handler to the AfterPost or AfterDelete event handlers of Table1, and issue a MessageBeep or ShowMessage call. That way you will prove to yourself that Table1 is being edited directly.)

There is a third option, which involves assigning an event handler to the DataSetProvider's BeforeUpdateRecord event handler. This event handler will then be invoked once for each record in the change log. You use this event handler to apply the changes in the change log programmatically, providing you with complete control over the resolution process. Writing BeforeUpdateRecord event handlers can be an involved process, and will be discussed in a future article.

When you invoke ApplyUpdates, you pass a single integer parameter. You use this parameter to identify your level or tolerance for resolution failures. If you cannot tolerate any failures to resolve changes to the underlying data source, pass the value 0 (zero). In this situation the DataSetProvider starts a transaction prior to applying updates. If even a single error is encountered, the transaction is rolled back, the change log remains unchanged, and the offending record is identified to the ClientDataSet (by triggering its OnReconcileError event handler, if one has been assigned).

If you pass a positive integer when calling ApplyChanges, the transaction will be rolled back only if the specified number of errors is exceeded. If fewer than the specified number of errors is encountered, the transaction is committed and the failed records are returned to the ClientDataSet. Furthermore, the applied records are removed from the change log, leaving only the changes that could not be applied.

If the number of failures exceeds the specified number, the transaction is rolled back, the change log is unchanged, and the records that could not be resolved are identified to the ClientDataSet as described earlier.

You can also pass a value of 1 when invoking ApplyUpdates. In this situation no transaction is started. Any records that can be applied are removed from the change log. Those whose resolution fail will remain in the change log, and are identified to the ClientDataSet through its OnReconcileError event handler.

That's basically how it works, although there are a number of variations that I have not considered. For example, it is possible to limit how many records the ClientDataSet gets from the DataSetProvider using the ClientDataSet's PacketRecords and FetchOnDemand properties. Similarly, you can pass additional information back and forth between the ClientDataSet and the DataSetProvider using a number of provided event handlers. Future articles in this series will describe how and when to use these properties.

Using ClientDataSets Nearly Everywhere

Now that we've overviewed the basic workings of the ClientDataSet and DataSetProvider components, let's return to the premise that I laid out at the beginning of this article. As I mentioned in the introduction, a strong argument can be made for using a ClientDataSet/DataSetProvider combination anytime data needs to be modified programmatically or displayed using data-aware controls.

There are three basic benefits to using ClientDataSet and DataSetProvider components for all data access.

  1. The combination provides a consistent set of data access features, regardless of which data access mechanism you are using.

  2. Their use provides a layer of abstraction in the data access layer, making future changes to the data access mechanism easier to implement.

  3. For local file-base systems (Paradox or dBase tables, for example), the ClientDataSet can greatly reduce table and index corruption.

Let's consider each of these points separately.

A Consistent, Rich Feature Set

 The ClientDataSet provides your applications with a consistent and powerful set of features independent of the data access mechanism you are using. Among these features are an editable result set, on-the-fly indexes, nested dataset, ranges, filters, cloneable cursors, aggregate fields, group state information, and much, much more. Specifically, even if the data access mechanism that you are using does not support a particular feature, such as aggregate fields or cloneable cursors, you have access to them through the ClientDataSet.

A Layer of Abstraction

In addition to the features supported by ClientDataSet, the ClientDataSet/DataSetProvider combination serves as a layer of abstraction between your application and the data access mechanism. If at a later time you find that you must change the data access mechanism you are using, such as switching from using the Borland Database Engine (BDE) to dbExpress, or from ADO to InterBase Express, your user interface features and programmatic control of data can remain largely unchanged. You simply need to hook the DataSetProvider to the new data access components, and provide any necessary adjustment to your DataSetProvider properties and event handlers.

Some people don't like the fact that a ClientDataSet holds changes in cache until you call ApplyUpdates. Fortunately, for those applications that need changes to be applied immediately you can make a call to ApplyUpdates from the AfterPost and AfterDelete event handlers of the ClientDataSet.

Reduced Corruption

For developers who are still using local file-based databases, such as Paradox or dBase, there is yet another very powerful argument. Hooking a ClientDataSet/DataSetProvider pair to a TTable can reduce the likelihood of table or index corruption to near zero.

Table and index corruption occurs when something goes wrong while accessing the underlying table. Since a TTable component has an open file handle on the underlying table so long as the TTable is active, this corruption happens all too often in many applications. When the data is extracted from a TTable to a ClientDataSet, however, the TTable is active for only very short periods of time; during loading and resolution, to be precise (assuming that you set the TTable's Active property to False, leaving the activation entirely up to the DataSetProvider). As a result, in most applications, accessing a TTable's data using a ClientDataSet/DataSetProvider combination reduces the amount of time that a file handle is opened on the table to less than a fraction of one percent compared to what happens when a TTable is used alone.

But It's Not for Every Application

While these arguments are compelling, I must also admit that this approach is not appropriate for every application. That a ClientDataSet loads all of its data into memory makes its use much more difficult when you are working with large amounts of data. There are work-arounds that you can use if you point a ClientDataSet to, say, a multi-million record data source, but doing so sometimes requires a fair amount of coding, thereby complicating the application.

For most applications, however, the combination of features provided by the ClientDataSet outweigh the disadvantages. But even if you do not accept this argument, I think that you will find many situations where the use of a ClientDataSet enhances your application's features, and simplifies your efforts.

About the Author

Cary Jensen is President of Jensen Data Systems, Inc., a Texas-based training and consulting company that won the 2002 Delphi Informant Magazine Readers Choice award for Best Training. He is the author and presenter for Delphi Developer Days (www.DelphiDeveloperDays.com), an information-packed Delphi (TM) seminar series that tours North America and Europe. Cary is also an award-winning, best-selling co-author of eighteen books, including Building Kylix Applications (2001, Osborne/McGraw-Hill), Oracle JDeveloper (1999, Oracle Press), JBuilder Essentials (1998, Osborne/McGraw-Hill), and Delphi In Depth (1996, Osborne/McGraw-Hill). For information about onsite training and consulting you can contact Cary at cjensen@jensendatasystems.com, or visit his Web site at www.JensenDataSystems.com.

Click here for a listing of upcoming seminars, workshops, and conferences where Cary Jensen is presenting.

Copyright ) 2002 Cary Jensen, Jensen Data Systems, Inc.

Server Response from: ETNASC04