Delphi Techniques: A Simple Alternative to TStringList

By: a z

Abstract: Using advanced records to eliminate the housekeeping code associated with using TStringList classes

Delphi Techniques: A Simple Alternative to TStringList

Sample code for this article can be downloaded from Code Central.

The TStringList class is widely used in Delphi apps; they're just one of those utility classes that are simple and useful enough that you find yourself using them all the time. But I have to be honest with you, I get really tired of writing try...finally blocks every time that I want to use a TStringList. Wouldn't it be nice if there were a more convenient way?

Enter 'advanced records'. Advanced records are simply old-school records with the addition of methods, properties, and constructors. They're more powerful than traditional records, but less capable than classes.

The nice thing about records (traditional or advanced) is that unlike classes the memory for records is automatically allocated; ie: there's no need for an explicit call to Create or Free. This allows us to write this code:

var
  list: TStringListRecord;
begin
  list.LoadFromFile('test.txt');
  ...
end;

rather than this code:

var
  list: TStringList;
begin
  list := TStringList.Create;
  try
    list.LoadFromFile('test.txt');
    ...
  finally
    list.Free;
  end;
end;

The Basics

Let's implement this record type by starting with the bare essentials. The two most important properties are Count and Items, and let's add some of the more common methods at the same time. We'll use a dynamic array to hold the strings.

TStringListRecord = record
private
  FItems: array of string;
  function  GetCount: integer; inline;
  function  GetItem(index: integer): string;
  procedure SetItem(index: integer; const Value: string);
public
  function  Add(const line: string): integer;
  procedure Clear;
  function  IndexOf(const s: string): integer;
  property  Count: integer read GetCount;
  property  Items[index: integer]: string read GetItem write SetItem; default;
end;

The interface is what you would expect. Note that private and public are the only visibility categories allowed. Since advanced records don't support inheritance protected visibility doesn't have any meaning.

The implementation is equally straight forward:


function TStringListRecord.Add(const line: string): integer;
begin
  SetLength(FItems,Count+1);
  FItems[Count-1] := line;
  result := Count - 1;
end;

procedure TStringListRecord.Clear;
begin
  SetLength( FItems, 0 );
end;

function TStringListRecord.GetCount: integer;
begin
  result := length(FItems);
end;

function TStringListRecord.GetItem(index: integer): string;
begin
  result := FItems[index];
end;

procedure TStringListRecord.SetItem(index: integer; const Value: string);
begin
  FItems[index] := Value;
end;

function TStringListRecord.IndexOf(const s: string): integer;
var
  k: integer;
begin
  result := -1;
  for k := 0 to Count - 1 do begin
    if FItems[k] = s then begin
      result := k;
      break;
    end;
  end;
end;

The Delete Method

The Delete method proves to be a little more interesting. The general approach is to simply slide the elements in the array down by one to take the place of the item that was deleted. One idea would be to use a for loop to slide the elements one by one, but that sounded slow to me so I decided to use the Move procedure. My first attempt looked like this (the lines have been numbered for easy reference):

{D1} procedure TStringListRecord.Delete(Index: Integer);
{D2} begin
{D3}   if (Index < 0) or (Index >= Count) then raise EStringListError.Create('List index out of bounds');
{D4}
{D5}   if Index < Count - 1 then begin
{D6}     move( FItems[Index+1], FItems[Index], sizeof(FItems[0]) * (Count - Index - 1) );
{D7}   end;
{D8}
{D9}   SetLength( FItems, Count - 1 );
{D10}end;
I used the code below to test the Delete method:
{T1} FList.Add('one');
{T2} FList.Add('two');
{T3} FList.Add('three');
{T4} FList.Delete(1);
{T5} FList.Clear;

Alas, when I ran the code I received a EInvalidPointer error on line T5, and the FastMM memory manager reported a memory leak when I exited the application. The problem is that by using the Move procedure we are going behind Delphi's back and preventing it from correctly managing the memory associated with the strings.

Here's what happening. After line T3, memory looks like this:

On the left is the FItems array with three elements, each containing a string. The actual data for the strings is stored on the heap and is shown on the right. Each string has a four byte reference count and a four byte length followed by the actual text, and then finally a null terminating byte (not shown in the diagram).

The Delete method is then called. Line D6 slides the contents of slot 2 downward into slot 1, essentially copying the value in slot 2 into slot 1. After line D6, memory looks like this:

As you can see, the string 'two' has been leaked, and there are now two pointers to 'three' although the reference count is only 1. When line D9 is executed Delphi notices that slot 2 points to a valid string and so it decrements the reference count and then frees the memory, thus causing slot 1 to point to deallocated memory. At this point the memory layout is:

An exception won't be raised until line T5, "FList.Clear", is executed. Delphi will release the memory for the string stored in slot 0, and will attempt to release the memory for the string stored in slot 1. However, since that memory has already been freed, a EInvalidPointer exception will be raised. When the application exits, the FastMM memory manager will detect that the 'two' string has been leaked and will display a warning message.

I decided to try a simpler technique:


procedure TStringListRecord.Delete(Index: Integer);
var
  k: integer;
begin
  if (Index < 0) or (Index >= Count) then raise EStringListError.Create('List index out of bounds');

  for k := Index to Count - 2 {2nd last element} do begin
    FItems[k] := FItems[k+1];
  end;

  SetLength( FItems, Count - 1 );
end;

This works well, except for the minor detail that it is 100 times slower. Let's go back to the original technique and try to make it work correctly.

The first problem was that a string was being leaked. This can be resolved by assigning an empty string to that slot. Behind the scenes Delphi will free the previous contents of the slot and then assign the new contents. Delphi represents zero-length long strings as nil, so slot 1 will contain nil rather than a pointer to memory:

After the call to the Move procedure is completed we need to assign nil to the last slot to ensure that Delphi doesn't deallocate that memory when we resize the array. Simply assigning an empty string will of course cause Delphi to deallocate the memory for the string so we need to take a sneakier approach:


PPointer(@FItems[Count - 1])^ := nil;

This gives us the result that we want, and we can now resize the FItems array safely.

Properties

TStringList has a number of other properties in addition to Count and Items such as CaseSensitive, Duplicates, Sorted, QuoteChar, etc. These are a bit more difficult to deal with.

The problem is that in Delphi most variables (including records) which are locally declared are not initialized. Thus, in the code below the value of list.ACardinal is undefined. It might be 0, 57, or any arbitrary value.


type
  TStringListRecord = record
  private
    ACardinal: cardinal;
  end;

procedure Test;
var
  list: TStringListRecord;
begin
  ShowMessageFmt('%d',[list.ACardinal]);
end;

One way is to add a Initialize procedure to the record which would assign default values, but this would require that the programmer remember to call Initialize every time that a TStringListRecord is used. This is an error waiting to happen.

You'll notice that the Items property works fine. That's because Delphi does automatically initializes dynamic arrays when they're declared locally. This is done out of necessity. You'll recall from earlier that when a dynamic array is resized, Delphi will attempt to free any memory associated with unused elements. If the dynamic array variable contained garbage an access violation error would occur during a resize operation.

This gives rise to a second way to initializing properties - use dynamic arrays to store the property. When the record is declared the array will have zero elements, and when a value is assigned it will be resized to have precisely one element. For example:


TStringListRecord = record
private
  FCaseSensitive: array of boolean;
  function  GetCaseSensitive: boolean;
  procedure SetCaseSensitive(const Value: boolean);
public
  property  CaseSensitive: boolean read GetCaseSensitive write SetCaseSensitive;
end;

function TStringListRecord.GetCaseSensitive: boolean;
begin
  result := (length(FCaseSensitive) = 0) or FCaseSensitive[0];
end;

procedure TStringListRecord.SetCaseSensitive(const Value: boolean);
begin
  if length(FCaseSensitive) = 0 then begin
    SetLength(FCaseSensitive,1);
  end;
  FCaseSensitive[0] := Value;
end;

The third and final option is to keep the TStringListRecord class simple and ignore the other properties. Since TStringListRecord is intended to be used more for convenience than power we'll take this approach and resort to using TStringList when we need more functionality.

Performance

We've taken a fairly simplistic approach to how the strings are stored. Each time we add a string the FItems array is resized and the string is assigned. In comparison, TStringList uses a more sophisticated algorithm. TStringLists have both a Count property and a Capacity property where the Capacity is always greater than equal to the Count. Capacity refers to the number of elements in the underlying array, and Count refers to the number of elements that are actually in use.

When the array needs to increase in size TStringList will add several elements at a time rather than just one. This allows TStringList to reduce the number of resizes. The key functionality occurs in TStringList's Grow method:


procedure TStringList.Grow;
var
  Delta: Integer;
begin
  if FCapacity > 64 then Delta := FCapacity div 4 else
    if FCapacity > 8 then Delta := 16 else
      Delta := 4;
  SetCapacity(FCapacity + Delta);
end;

Thus the Capacity will grow by either 4 elements, 8 elements, or 25% of the current Capacity depending on the current number of elements.

Based on this you would think that TStringList would be considerably faster than TStringListRecord. In fact they're quite similar in performance. Presumably the additional functionality found in TStringList offsets the advantage of the more intelligent algorithm.

Conclusion

Using advanced records to implement a replacement for TStringList allows us to eliminate the housekeeping code associated with a TStringList. The result is simpler, more readable code. While in this article TStringListRecord was kept quite simple, there's no reason that the full functionality of the TStringList class could not be implemented.

f



Server Response from: ETNASC02