Improved searching in a client dataset (part 1) - By Tjipke A. van der Plaats

By: Tjipke van der Plaats

Abstract: This article explains a way to search faster in client datasets by creating a TClientDataSet descendant and re-implementing Locate or Lookup so that they use indexes.

Remark: In an earlier version of the source for this article was a litle bug in restoring the indexes, this was updated on June 18th 2004.
Remark 2: In an earlier version of the source for this article was a bug in searching on only the first fields of an index, this was updated on Sept 8th 2004.
Introduction

First I would like to start with a short introduction why I wrote this article (and source code). Once upon a time I was creating an application that used lookup fields. But the application was very slow, and I found that the problem was caused by a certain lookup field. The lookup field was linked to a large table and every time it needed to lookup it's value, it searched in that table which took some time. I looked at the LookupCache property of the field to solve this. Because if LookupCache is True, your field will preload all possible values in a local list. But as you'll understand this is only efficient if you have a small table or only have to do this once. Not if you have a large table like I had and have to do it multiple times, because then it will reload the whole contents every time. Also if you have a different table using the same lookup table, it will have it's own lookup list.

However thinking about all this, I figured I could maybe cache the data in a different way using a client dataset. I would load that with the data from the table on application startup, and use that as a lookup dataset in my application. So I did. And indeed the application performed a lot better. Enough for the application to be acceptable, but the action still took a few seconds. "No problem", I thought, I just add an index on the field that was used to find the value for the lookup field, which would of course improve the performance! But it didn't improve a thing, not even a millisecond!

I know that lookup fields use the Lookup method of a TDataSet. And in most TDataSet descendants (like BDE's TTable, or my own CB4 Tables), Lookup would use an index to locate the record if an index for the field is available. But it seemed the client dataset wasn't doing that! I looked at the source code of TClientDataset and saw that it was creating a filter to find the specified record, but apparently the internal filter evaluation wasn't using indexes.

At that time the application was fast enough, so I didn't put more effort in it, but I kept it in my mind. It should be possible to improve the locate performance of a client dataset!

Getting to work

A few days ago the time came to put that thought into code!

I started to make a little 'project1' test application, on which I created a client dataset containing 2 fields: a Number (=Integer) and a Value (=String) field. I filled it with 10000 records, containing the values 1 to 10000 in the number and in the value field and added a function to find 1000 random records using Locate. (Lookup and Locate internally use the same way of finding a record.) I also made a function to find 1000 random records using FindKey. The performance difference was amazing: it took 9000 ms using Locate, and about 40-60 ms using FindKey. Enough reason to start a better version of Locate.

The Locate and Lookup functions both use the protected LocateRecord method to find a record. So if I wanted to improve something I had to improve this LocateRecord method. Overriding it would have been the easiest way, but I couldn't since it wasn't declared as virtual... That was too bad, but there was another solution, as Locate and Lookup method are declared virtual, I could override and re-implement those.

First I started with the Locate method, the Lookup methods is just a variation on it so that could be done later. The basic framework was simple:


function TtrsClientDataset.Locate(const KeyFields: string; const KeyValues:
    Variant; Options: TLocateOptions): Boolean;
var
  CouldUseIndex: Boolean;
begin
  Result := LocateRecordUsingIndex(KeyFields, KeyValues, Options, True, CouldUseIndex);
  if not CouldUseIndex then
    Result := inherited Locate(KeyFields, KeyValues, Options);
end;
It just says: try to search using indexes, if that can't be done, use the original Locate method. I copied the parameter list for this new LocateRecordUsingIndex method from the LocateRecord method and added an extra parameter to return wether an index could be used.

So now on to implementing this new method. First I had to find out if an index could be used. There were two alternatives here: I could use an (currently active) index that was set using IndexFieldNames or I could use one of the defined indexes on the dataset. The second one was the easiest as there is a GetIndexForFields method in a client dataset's IndexDefs collection. Using a current index on IndexFieldNames was also not that complicated: TTable (well actually TBDEDataSet) has the same kind of code in a MapsToIndex method. I re-implemented that for client dataset:


function TtrsClientDataset.MapsToIndex(Fields: TList; CaseInsensitive: Boolean):
    Boolean;
var
  I: Integer;
begin
  Result := False;
  if (IndexFieldNames = '') then
    Exit;
  if Fields.Count > IndexFieldCount then Exit;
  for I := 0 to Fields.Count - 1 do
    if Fields[I] <> IndexFields[I] then Exit;
  Result := True;
end;
I was able to remove a lot of code from the original TBDEDataSet.MapsToIndex. There was code in there handling case insensitive searching on strings, but since you can't use IndexfieldNames for a case insensitive search all that code could be removed. I added only one line: the line with the test on IndexFieldNames.

My LocateRecordUsingIndex method now looked like:


function TtrsClientDataset.LocateRecordUsingIndex(const KeyFields: string; const
    KeyValues: Variant; Options: TLocateOptions; SyncCursor: Boolean; out
    CouldUseIndex: Boolean): Boolean;
var
  KeyIndex: TIndexDef;
begin
  KeyIndex := nil;
  Fields := TList.Create;
  try
    GetFieldList(Fields, KeyFields);
    if MapsToIndex(Fields, loCaseInsensitive in Options) then
      CouldUseIndex := True
    else
    begin
      KeyIndex := IndexDefs.GetIndexForFields(KeyFields, loCaseInsensitive in Options);
      CouldUseIndex := KeyIndex <> nil;
    end;
    if CouldUseIndex then
      {TODO : Code to search on the index}
    else
      Result := False;
  finally
    Fields.Free;
  end;
end;
The only thing left was the part that does the actual searching.

The code for the searching was also not very complicated. First if the current index can't be used I needed to activate the correct index. Then I could do the search using GotoKey and after that I needed to restore the original index. The only part that needed some thinking was setting the KeyValues. But looking around in DBTables.pas code helped me with that. Only one thing was forgotten: you can do a locate on a partial key using the loPartialKey option. I decided not to handle that case as that wasn't as easy to implement, and I didn't needed it.

Now let's see how the final implementation of the LocateRecordUsingIndex method looked like:


function TtrsClientDataset.LocateRecordUsingIndex(const KeyFields: string; const
    KeyValues: Variant; Options: TLocateOptions; SyncCursor: Boolean; out
    CouldUseIndex: Boolean): Boolean;
var
  I: Integer;
  Fields: TList;
  SavedIndexName, SavedIndexFieldNames: string;
  KeyIndex: TIndexDef;
begin
  if (loPartialKey in Options) then
  begin
    CouldUseIndex := False;
    Result := False;
    Exit;
  end;

  KeyIndex := nil;
  Fields := TList.Create;
  try
    GetFieldList(Fields, KeyFields);
    if MapsToIndex(Fields, loCaseInsensitive in Options) then
      CouldUseIndex := True
    else
    begin
      KeyIndex := IndexDefs.GetIndexForFields(KeyFields, loCaseInsensitive in Options);
      CouldUseIndex := KeyIndex <> nil;
    end;
    if CouldUseIndex then
    begin
      if Assigned(KeyIndex) then
      begin
        SavedIndexName := IndexName;
        SavedIndexFieldNames := IndexFieldNames;
        IndexName := KeyIndex.Name;
      end;
      try
        SetKey;
        if Fields.Count = 1 then
        begin
          if VarIsArray(KeyValues) then
            TField(Fields.First).Value := KeyValues[0] else
            TField(Fields.First).Value := KeyValues;
        end
        else
          for I := 0 to Fields.Count - 1 do
            TField(Fields[I]).Value := KeyValues[I];
        {To make sure that it only searches on first Fields.Count records, set FieldCount}
        GetKeyBuffer(kiLookUp).FieldCount := Fields.Count;
        Result := GotoKey;
      finally
        if Assigned(KeyIndex) then
        begin
          {Restore indexes, always restore at least one of them}
          if SavedIndexFieldNames <> '' then
            IndexFieldNames := SavedIndexFieldNames
          else
            IndexName := SavedIndexName;
        end;
      end;
    end
    else
      Result := False;
 finally
    Fields.Free;
  end;
end;

The Locate and LocateRecordUsingIndex were finished and the only thing left was the implementation of the Lookup method. But that was straightforward. Lookup just calls the LocateRecordUsingIndex method and if it succeeds it returns the FieldValue. Else depending on whether indexes could be used, either Null or the result of the inherited method is returned.


function TtrsClientDataset.Lookup(const KeyFields: string; const KeyValues:
    Variant; const ResultFields: string): Variant;
var
  CouldUseIndex: Boolean;
begin
  if LocateRecordUsingIndex(KeyFields, KeyValues, [], False, CouldUseIndex) then
    Result := FieldValues[ResultFields]
  else
  begin
    if not CouldUseIndex then
      Result := inherited Lookup(KeyFields, KeyValues, ResultFields)
    else
      Result := Null;
  end;
end;

Testing

After creating the component, I also had to test it and see what improvements I got. I changed my simple 'project1' test application and turned it into something general (see the end of this article for the location of the source code). The test showed an highly improved performance. When used in a client dataset with 1000 records, the new client dataset was about 20 times faster using Locate and 10 times faster using a lookup field.

However the test also revealed something unexpected. Using IndexFieldNames to sort the client dataset and then searching on an index did not improve the performance that much. I looked at the implementation of IndexFieldNames and I saw what was the problem. When setting IndexFieldNames, the client dataset on the fly creates a sort order on those fields, not an index. So every time I switch between an index and IndexFieldNames this sort order was recreated! Therefore combining and index with IndexFieldNames is not a very good idea.

Side effects

When working with this implementation I found some side effects! I tried to create an index on the fly using the IndexDefs.Add method, but found that it was removed in the call to IndexDefs.GetIndexForFields. Later I saw that you could use the AddIndex method of a TClientDataSet and that that would work, but I haven't tested that. Also the original implementation of Lookup doesn't do a Resync() call, where the implementation using GotoKey does. And to be complete I haven't tested the code in cases when the client dataset is used with a provider. I don't see why it shouldn't work, but you never know.

Conclusion

This finishes my implementation for improving the Locate/Lookup implementation of client datasets. You can get the source of this solution, including the test project and a test package to put the new component in your component palette at codecentral. The source was written with Delphi 7, but you should be able to get it to work in earlier Delphi versions like Delphi 5 and 6.

I think there are still some improvements possible like using the loPartialKey with indexes and using indexes even if they only partially match to the searched fields. But I leave that to a second article.


About the Author:

Tjipke A. van der Plaats works as a software engineer for Agrovision B.V. where he works on various projects. He also has it's own company Tiriss, that sells two products for developers (CB4 Tables and ChangeRes). He also creates software or components on request. If you have questions feel free to contact him.

Copyright ) MMIV by Tjipke A. van der Plaats


Server Response from: ETNASC04