Programming with concepts

By: Steve Downey

Abstract: The best way to understand the Standard Template Library -- the collections, iterators, and algorithms included in the Standard C++ library -- is in terms of the concepts that it employs. By Steve Downey.

Programming with concepts

The best way to understand the Standard Template Library -- the collections, iterators, and algorithms included in the Standard C++ library -- is in terms of the concepts that it employs.

Generic programming introduces a new idea into the C++ programming language: the "concept." A concept is similar to an interface or a class, but it can encompass primitive types such as pointers. A concept can also include the idea of how fast an operation can be performed, which cannot be expressed at all within the language.

A concept is the specification of what a template expects to deal with. Although a concept cannot be declared in a program, nor can a type be declared to conform to a concept, it is still something that should be kept in mind when authoring or using templates.

The best way to understand the STL -- the collections, iterators, and algorithms that are part of the Standard C++ Library -- is in terms of the concepts that it employs. 

Iterators are intermediate between collections and algorithms. Algorithms take iterators as arguments that specify the collection they are operating on, as well as where the output will be sent.

The function unique_copy can provide a concrete example. This function is very similar to the Unix command uniq. It takes a pair of iterators that specify the input range and an iterator showing where to write the output. If successive elements are identical, only the first is copied to the output.

This is the declaration from the standard:

template <class InputIterator, 
  class OutputIterator> 

  unique_copy(InputIterator first, 
  InputIterator last, 
  OutputIterator result);

The template class types InputIterator and OutputIterator are STL concepts. They say what the unique_copy algorithm requires.

An InputIterator is one of the simplest iterators. An InputIterator must be dereferenceable, and it must be incrementable to the next iterator in the sequence. That is, if j is an InputIterator, you must be able to write:

N = (*j);

Are you paying attention?

If these look a lot like operations you would do on a pointer, youre paying attention. A pointer satisfies the requirements of an InputIterator. In fact, pointers satisfy the requirements of most kinds of iterators.

OutputIterators are almost the opposite of InputIterators. OutputIterators can be written to but not read, while InputIterators can be read but not written to. An OutputIterator can be incremented; it can also be dereferenced and assigned through. However, there are no guarantees about the result of dereferencing an OutputIterator. That means that if k is an OutputIterator, you can write:

(*k) = N;

but you cant write:

P = (*k);

For both InputIterators and OutputIterators, all operations take constant time. Knowing that allows guarantees to be made about how long unique_copy will take. Unique_copy will proceed in linear time. That is, it will take an amount of time proportional to the number of elements that must be copied.

The unique_copy function takes two InputIterators that specify the range it operates over. It starts at the first and stops before operating on the last iterator. This is similar to an EOF marker. Trying to actually read the EOF is an error. Reading past the end of an array is similarly an error. So is trying to dereference an iterator that points to the end of a container.

Random access

Another useful STL algorithm is sort.

template <class RandomAccessIterator>

void sort(RandomAccessIterator first, 
  RandomAccessIterator last);

Sort takes two RandomAccessIterators that define the range to sort over. In mathematical notation the range would be [first, last), meaning that it includes first and not last.

RandomAccessIterator is another STL concept. Its requirements include those of InputIterator and OutputIterator. It adds more operations, one of which is decrement. Another is the ability to step arbitrary distances forward and backward in constant time.

Pointers are excellent models of RandomAccessIterators. In code, if p is a RandomAccessIterator:

N = *p++;

O = *(p + 5);

Q = *p--;

p += 2;

All are legal, valid statements, assuming that there are at least six elements in the container p refers to.

The programmer picks up where the compiler leaves off

The promise that this concept makes is that all of those operations take constant time. That means that adding two to p cant be implemented by incrementing p twice. That would be linear time, not constant.

Because of that guarantee, sort can promise to take N * log(N) time on average where N is the number of elements to be sorted. Sort also promises to take N * log(N) time in the worst case. But it doesnt promise to be a stable sort. Items that compare equally may be in a different order when returned. Stable_sort does make the promise that items that compare equally will return in the same order -- but its worst case behavior is N * log(N) ^ 2. This can be important when you are sorting an already sorted list. For example, if youve just sorted a list of e-mail messages by date and then you sort by sender, stable_sort will leave the messages in date order within sender and sort may scramble them.

The STL has strong theoretical underpinnings in algorithmic theory, and in categorical type theory. The support of C++ for the idea of concepts is weak. The compiler can flag a template as uninstantiatable if the type supplied doesnt support all of the operations the template requires. It cant report if the operation doesnt meet the pre- or post-conditions of the concept. It also cant tell if the complexity requirement is met. That is, it can't tell if the operation is supposed to take linear or constant time. For now, these have to be checked by programmers.

Knowing your responsibility is more than half the battle in programming.

Server Response from: ETNASC03