Appendix A. C++ Templates by Example

This chapter is meant to be a brief introduction to the basics of C++ templates using examples. It also shows some of the advanced template features that ATL uses, but without any ATL code cluttering up things. For a more thorough introduction to templates, see Stan Lippman’s C++ Primer (Addison-Wesley, 2005).

The Need for Templates

Imagine a simple bounds-checked array class:

#define MAX_ELEMS 8

class Array {
public:
  long& operator[](size_t n) {
    if( n < 0 || n >= MAX_ELEMS ) throw "out of bounds!";
    return m_rg[n];
  }

protected:
  long m_rg[MAX_ELEMS];
};

This class makes quiet, hard-to-find errors loud and easy to find:

void main(int argc, char* argv[]) {
  long rg[8]; // Built in array type
  rg[8] = 1;  // will corrupt the stack, but quietly

  Array array;  // Bounds-checked array type
  array[8] = 1; // will complain loudly
}

The bounds-checking part of the Array class really has nothing to do with the data being managed; using a little bit of C++ trickery, this is even easier to spot:

typedef long T;
class Array {
public:
  T& operator[](size_t n) {
    if( n < 0 || n >= MAX_ELEMS ) throw "out of bounds!";
    return m_rg[n];
  }

protected:
  T m_rg[MAX_ELEMS];
};

Notice that we’ve replaced the use of long with a generic type T. Unfortunately, this trick doesn’t allow us to reuse the Array class with different types of T. When the compiler sees the Array class, it won’t let us change T and compile it again with another type T. To get any reuse out of the Array class as it is, we have to do some cut-and-paste work and create different Array classes, one for each type we’re interested in managing:

class ArrayOfChar {
public:
  char& operator[](size_t n);

protected:
  char m_rg[MAX_ELEMS];
};

class ArrayOfLong {
public:
  long& operator[](size_t n);

protected:
  long m_rg[MAX_ELEMS];
};

Besides the tedium involved with this technique, the developer of the Array family of classes would have to build an Array class for every type that the user of the class would want to manage. Because some of these types can be defined after the Array class is built, this is an especially difficult task. We’d like the compiler to step in and help us. And so it will, with templates.

Template Basics

Using template syntax, we can create an Array class that is parameterized on any number of parameters, including the type of data to manage and how large the internal buffer should be: [1]

template <typename T, size_t MAX_ELEMS>
class Array {
public:
  T& operator[](size_t n) {
    if( n < 0 || n >= MAX_ELEMS ) throw "out of bounds!";
    return m_rg[n];
  }

protected:
  T m_rg[MAX_ELEMS];
};

Notice that the only difference in this code is the use of the template statement before the class declaration. When the compiler sees a template, it knows to store the class declaration in its internal data structures, but not to generate any code. The compiler can’t generate the code until it sees how the client would like to use it:

struct Point { long x; long y; };

void main(int argc, char* argv[]) {
  Array<long, 8> a1;   // Array of 8 longs
  Array<char, 256> a2; // Array of 256 chars
  Array<Point, 16> a3; // Array of 16 Points
  ...
}

The compiler uses the client’s template parameters to generate the code for the class on demand, effectively creating a new member of the Array family of classes with each use. [2] Because the compiler is using the template parameters to generate the code, only parameters whose values are known at compile time are allowed. However, that includes built-in types, user-defined types, constants, and even function pointers. To make the template even more convenient for the client to use, you’re allowed to declare default values for template parameters, just as you would for functions:

template <typename T, size_t MAX_ELEMS = 8>
class Array {...};

void main(int argc, char* argv[]) {
  Array<long>      a1; // Array of 8 longs
  Array<char, 256> a2; // Array of 256 chars
  ...
}

Template Specialization

You might decide that for a specific combination of template parameters, the generic template expansion isn’t good enough. For example, if you decide that an Array of 256 characters should have an equality operator, you might decide to override the Array general template using the template specialization syntax:

template <> // No template arguments here
class Array<char, 256> { // Template argument values here
public:
  // You are not required to provide the same functionality
  // as the general template (although it's a good idea)
  char& operator[](size_t n) {
    if( n < 0 || n >= 256 ) throw "out of bounds!";
    return m_sz[n];
  }

  // You may add functionality not in the general template
  bool operator==(const Array<char, 256>& rhs) {
    return strcmp(m_sz, rhs.m_sz);
  }

protected:
  char m_sz[256];
};

The client doesn’t have to do anything new to use the specialized version of the template. When the compiler sees Array<char, 256>, the client automatically gets the specialized version.

Templates as Base Classes

Template specialization allows the addition of new functionality and optimized implementations based on specific template parameters. For example, the specialization I just showed specializes on all the parameters, both the type and the size of the array. It would probably be more useful to be able to specialize on the type of data held, but to expose additional functions for character strings of any size. This can be accomplished by using the Array as a base class:

template <size_t MAX_LEN>
class String : public Array<char, MAX_LEN+1> {
public:
  // Additional functionality
  bool operator==(const String<MAX_LEN>& rhs) {
    return strcmp(m_rg, rhs.m_rg);
  }
};

Notice that the String is still parameterized on length and that it passes arguments to the base Array class. The type is fixed because we’re building a string, but the number of elements to store is based on the String template argument. In effect, this achieves a partial specialization, although the client code must use the String template instead of the Array template to make use of it.

A Different Kind of Polymorphism

Templates give us a different kind of reuse mechanism than inheritance. Inheritance means that we’d like to reuse the implementation and signature of a base class and extend it in a derived class. Inheritance is type-centric. The derived class is type compatible with the base class. Type compatibility means that an instance of the derived class can be used where an instance of the base class is required.

On the other hand, templates enable you to reuse the behavior of a class, but to divorce it from the types involved. If the type can fit where it’s being used, the compiler doesn’t care about the type. For example, all that was required of the type being passed to the Array template was that it had a default constructor so that it could be created in arrays. If the type couldn’t live up to that requirement, the compiler would complain.

Because of the way the compiler generates code using template arguments, templates give us a different kind of polymorphism. Instead of polymorphism based on type compatibility, we’ve got polymorphism based on signature compatibility. [3] If the type has the appropriate function signatures available, the compiler is perfectly happy. This kind of polymorphism has some interesting properties, of which ATL makes heavy use.

Using Behavior Classes

Imagine modifying the Array class so that it does bounds checking only if you so desire:

template <typename T, size_t MAX_ELEMS = 8,
  bool bCheckBounds = true>
class Array {
public:
  T& operator[](size_t n) {
    if( bCheckBounds && (n < 0 || n >= MAX_ELEMS) )
      throw "out of bounds!";
    return m_rg[n];
  }

protected:
  T m_rg[MAX_ELEMS];
};

This allows a client to turn bounds checking on or off, based on its own requirements:

void main(int argc, char* argv[]) {
#ifdef _DEBUG
  bool bCheckBounds = true;
#else
  bool bCheckBounds = false;
#endif

  Array<long, 256, bCheckBounds> array;
  array[256] = 1;
}

The intent here is that we skip the bounds checks in release mode because we have hopefully caught the out-of-bounds errors during development. However, we’re still doing a check against bCheckBounds every time through the operator[] member function. We could hope for a good compiler that would optimize away the line because bCheckBounds is known at compile time, or we could remove all doubt and make use of a behavior class.

Behavior Classes Can Group Functions

A behavior class (also known as a trait class or a trait) is a class that contains some static member functions or some type definitions. Behavior classes are never meant to be created. Instead, by putting the functions and type definitions into a class, we’ve grouped them and can refer to them as a unit.

To solve our bounds-checking problem, imagine two behavior classes, one that checks the bounds and one that does not:

struct DebugBoundsChecker {
  static void CheckBounds(size_t n, size_t nMax) {
    if( n < 0 || n >= nMax ) throw "out of bounds!";
  }
};

struct ReleaseBoundsChecker {
  static void CheckBounds(size_t n, size_t nMax) {}
};

Notice that both the behavior classes have the same function with the same signature. The debug version does the bounds checking, and the release version doesn’t do anything. And because both implementations are inline functions, the compiler can optimize easily. Given only the knowledge of the signature of the bounds-checking function, I can rewrite the Array class to use the CheckBounds function scoped by a type name passed as a template parameter:

template <typename T, size_t MAX_ELEMS = 8,
          typename BoundsChecker = DebugBoundsChecker>
class Array {
public:
  T& operator[](size_t n) {
    BoundsChecker::CheckBounds(n, MAX_ELEMS);
    return m_rg[n];
  }

protected:
  T m_rg[MAX_ELEMS];
};

The client can now take advantage of the bounds checkingor not, as decided at compile time:

void main(int argc, char* argv[]) {
#ifdef _DEBUG
  typedef DebugBoundsChecker BoundsChecker;
#else
  typedef ReleaseBoundsChecker BoundsChecker;
#endif

  Array<long, 256, BoundsChecker> array;
  array[256] = 1;
}

In this case, I’ve used signature compatibility to make my decisions at compile time, resulting in potentially more efficient code. And if I want to add another kind of bounds-checking behavior class in the future, I can, if it has a CheckBounds function that lives up to the signature requirements of the caller.

Simulating Dynamic Binding

One of the benefits of inheritance is dynamic binding. A virtual function in the base class can be overridden in the derived class to extend or replace base class functionality. One especially powerful expression of this idea is the pure virtual member function. A pure virtual member function is a virtual member function that must be overridden in the deriving class. In fact, any class with a pure virtual member function is thought of as an abstract base class (ABC), and no instance of that class can be created. To use the functionality of an ABC, it must be used as a base class, and the deriving class must implement all the pure virtual member functions. This is useful because it allows the base class to define functionality required by the deriving class:

template <typename T>
class Array {
public:
  ...
  virtual int Compare(const Array<T>& rhs) =0;

  bool operator< (const Array<T>& rhs)
  { return this->Compare(rhs) < 0; }

  bool operator> (const Array<T>& rhs)
  { return this->Compare(rhs) > 0; }

  bool operator== (const Array<T>& rhs)
  { return this->Compare(rhs) == 0; }

  T m_rg[1024];
};

By defining the Compare function as pure virtual, the Array class designer has decreed that Array can be used only as a base class and that the deriving class must provide some way of comparing two arrays of the same type so that the comparison operators can be implemented. For example, a String class must implement the Compare function:

class String : public Array<char> {
public:
  // the compiler requires this method:
  int Compare(const Array<char>& rhs)
  { return strcmp(m_rg, rhs.m_rg); }
};

The compiler uses the implementation of the pure virtual member function provided in the derived class to fill in the virtual function table entry for that member function (as it does with all virtual member function pointers). Using function pointers to invoke member functions is how C++ provides dynamic bindingthat is, binding to the specific implementation at runtime based on the type of the variable.

However, we’re paying the price of dynamic binding for our String class. The price of dynamic binding is a virtual function table, shared among all instances, an extra 4-bye virtual function pointer per object, and at least two extra pointer indirections to invoke the appropriate implementation. This seems a high price to pay, given perfect knowledge at compile time. We know how to compare two String objects. Why not make the compiler do the work and save us the overhead of dynamic binding? We can do that by using simulated dynamic binding.

Simulated dynamic binding enables us to provide the name of the deriving class to the base class. The base class casts itself to the deriving class to call the required function. Revising the Array to use simulated dynamic binding looks like this:

template <typename T, typename Deriving>
class Array {
public:
  ...
  bool operator< (const Array<T, Deriving>& rhs)
  { return static_cast<Deriving*>(this)->Compare(rhs) < 0; }

  bool operator> (const Array<T, Deriving>& rhs)
  { return static_cast<Deriving*>(this)->Compare(rhs) > 0; }

  bool operator== (const Array<T, Deriving>& rhs)
  { return static_cast<Deriving*>(this)->Compare(rhs) == 0; }
  T m_rg[1024];
};

Notice that the Array template takes an additional parameter: the name of the deriving class. It uses that class name to perform a static cast on itself. Because the compiler expands the code of the base class while instantiating the deriving class, the static cast performs a perfectly safe downcast (normally a contradiction in terms). The compiler uses the member functions of the deriving cast to resolve the address of the required function at compile time, saving us the cost of dynamic binding. The deriving class must implement the appropriate member function and pass its own name as a parameter to the base class template:

class String : public Array<char, String> {
public:
  // the compiler requires this method:
  int Compare(const Array<char, String>& rhs)
  { return strcmp(m_rg, rhs.m_rg); }
};

This technique gives us the look and feel of dynamic binding without using virtual member functions. The base class can require functionality of the deriving class. The base class cannot be instantiated by itself. The major difference is that because no virtual functions are required, we don’t have the overhead of dynamic binding.

The Discovery of Simulated Dynamic Binding

It’s my understanding that the ATL team members discovered simulated dynamic binding accidentally. When they did, they went immediately to the compiler team members, who claimed that the C++ Standard does not mandate or prohibit such behavior, but they promised to keep it working for ATL. Does that mean you should use simulated dynamic binding when you’re writing your most portable code? Probably not. Should you still understand it because ATL is rife with it? Absolutely. Should you sneak it into your own bag of tricks? It’s in mine.

Function Templates

Classes are not the only thing that can be parameterized; functions can be, too. The canonical example is the min function:

inline long min(long a, long b) { return (a < b ? a : b); }
inline float min(float a, float b) { return (a < b ? a : b); }

Because the code is the same for both overloaded min implementations, there’s no reason not to make it into a template:

template <typename T>
inline T min(T a, T b) { return (a < b ? a : b); }

When the template is instantiated, the compiler generates the code on demand based on the use:

void main() {
  // produces: long min(long, long)
  long a = 1, b = 2, c = min(a, b);

  // produces: float min(float, float)
  float x = 1.1, y = 2.2, z = min(x, y);

  // produces: char *min(char *, char *)
  char *r = "hello", *s = "world", *t = min(r, s);
}

Notice that the compiler can figure out how to insatiate the min template implementation without any fancy angle brackets. However, also notice that sometimes, just as with class templates, function templates don’t expand the way we’d like. For example, the min that takes two char* compares two pointer values, not the contents of the strings. A function template can be specialized by merely providing an implementation that takes the types involved:

inline char* min(const char* a, const char* b) {
  return (strcmp(a, b) < 0 ? a : b);
}

In this case, because a version of the function that takes two character pointers already exists, the compiler binds to that one instead of generating another implementation based on the min function template.

Member Function Templates

The fun of templates does not stop at global functions. Oh, no. You can create member function templates as well. As of Visual C++ 6.0, the definition of IUnknown has been augmented with a member function template for QueryInterface:

struct IUnknown {
...
  template <class Q>
  HRESULT STDMETHODCALLTYPE QueryInterface(Q** pp)
  { return QueryInterface(__uuidof(Q), (void**)pp); }
}

Before the member function template, all calls to QueryInterface had to be sure to match up the interface type and the interface identifier:

void Fly(IUnknown* punk) {
  IBird* pbird = 0;
  punk->QueryInterface(IID_ICat, (void**)&pbird); // Oops!
  punk->QueryInterface(IID_IBird, (void**)pbird); // Double oops!
  punk->QueryInterface(IID_IBird, (void**)&pbird); // OK
  pbird->Fly();
}

On the other hand, with the QueryInterface member function template, the type of the interface suffices:

void Fly(IUnknown* punk) {
  IBird* pbird = 0;
  punk->QueryInterface(&pbird); // __uuidof uses to determine IID
  pbird->Fly();
}

In effect, member function templates add new member functions of a class, just as function templates add new global functions based on use.

Summary

Templates provide two services. The first is code reuse, but in a different way than inheritance. Inheritance requires a certain type relationship. Templates require only that the compiler be capable of finding the functionality required of the specified type. Inheritance makes type requirements, whereas templates take what types we give them.

The other service that templates provide is efficiency. By mixing in types at compile time, we can use static binding to make decisions at runtime instead of at compile time. This is the key to the promise of smaller, faster code that templates make.