Sunday, August 24, 2008

C++ Iterators

A friend of mine and I are collaborating on a game in the C++ programming language. The part I'm writing is a general purpose engine that will be used on both the game server and the game client, so I decided to use templates to allow basic game objects to be replaced with entirely different classes in the client and server, yet still use all the same code on top of them.

In the process of moving my code from regular classes to template classes, I uncovered a fundamental language wart or compiler flaw. It has to do with a problem I ran into using STL iterators. Before I go into what the problem is, let me explain iterators a bit.

The people who wrote C++'s standard template library want to break you of the habit of writing things like:

for (int i=0; i < vec.size(); i++)

Instead you should write this:

for (vector< int >::iterator i=vec.begin(); i != vec.end(); i++)

Do you see the improvement? No? It's supposed to simplify code for traversing containers--except that it's twice as much typing! OK, well it also gives your code portability to use different containers besides vector ...theoretically... except that we have to specify "vector" directly in the for loop to say what kind of iterator it is! Oops.

The real motivation behind iterators is that it unifies the concepts for the people who write the container classes. I won't say the users of the container classes can't use iterators in ways that also provide benefit, but the users also bear a burden of extra boilerplate that adds as much work as it saves. This is the sort of thing Bjarne Stroustroup means when he talks about the compiler vendors and library/tools group being overrepresented on the C++ standards committee compared to actual users of C++. At least we users are finally going to get the auto keyword in C++0x, which will allow us to just write the word "auto" in place of the vector::iterator gobbledygook.

But being the good little C++ citizen that I am, I've taken the trouble to use iterators in all my code, even if it is a lot more typing. That's how when I moved my engine definition into templates, I got burned badly.

Where before I had this:

for (vector< GameObj >::iterator i=vec.begin(); i != vec.end(); i++)

I now have:

template < typename T >
... ...
for (vector< T >::iterator i=vec.begin(); i != vec.end(); i++)

What is supposed to happen is that when I instantiate the code with T=GameObj or T=OtherGameObj, the code will get built at the point I instantiated it and the result of the latter will look like the former except with either GameObj or OtherGameObj substituted for T.

Instead, it turns out that you can't (at least on my compiler) use a template argument like T to specify the type argument to a container iterator. You can declare the container using that T, but you cannot access the iterator of the container you just created! On some containers like vector you can just go back to the old notation, but for containers like map<>, iterators are the only way to loop over the container. This is a Really F*ing Big Problem. This is like laying the foundation for a building and realizing you didn't leave room in the floor plan for any doors. The irony is that iterators from the standard template library don't work with template arguments! This is like forgetting to leave room for doors in the plans for a f*ing door factory!

6 comments:

Nick Johnson said...

Use the typename keyword again, eg:

#include < iostream >
#include < vector >

using namespace std;

template < class T >
T maximum( vector< T > &list )
{
T max = list[0];

for( typename vector< T >::iterator it = list.begin();
it != list.end();
it++)
if( *it > max )
max = *it;

return max;
}


int main()
{
vector< int > foo(5);

foo[0] = 100;
foo[1] = 200;
foo[2] = 300;
foo[3] = 250;
foo[4] = 150;

cout << "Max is " << maximum< int >(foo) << endl;

return 0;
}


Tested with g++ 4.1. Yeah, I know, it sucks. So does C++. So does trying to type C++ into a blogger comment ;)

Anonymous said...

Use 'typename' whenever you access a type nested in a templated type. When the nested type is a template you need to use the 'template' keyword as well.

Prefer STL algorithms before manual for-loops. Until the next standard boost::bind is your friend.

Consider using typedefs to simplify and make future maintainance (like switching collection type) easier. 'auto' will make this easier.

class Foo {
typedef std::vector< T > Things;
...
};

void Foo::bar() {
for (typename Things::const_iterator i = _things.begin()...)
...
}

You could even add this to the class:
typedef typename Things::const_iterator ThingIterator;

Which would make the loop look like this:
for (ThingIterator i = _things.begin()...)

Michel Boto said...

Instead you should write this:

for (vector< int >::iterator i=vec.begin(); i != vec.end(); i++)

Do you see the improvement? No? It's supposed to simplify code for traversing containers--except that it's twice as much typing! OK, well it also gives your code portability to use different containers besides vector ...theoretically... except that we have to specify "vector" directly in the for loop to say what kind of iterator it is! Oops.


You don't have to specify anything directly in the for loop. Writing out iterators is a pain, yeah, but that's normally why you put typedefs somewhere (preferably above the first usage of the iterator type in the translation unit, or even more preferably in a separate header containing all of your other forward declarations and setup typedefs for that module).

typedef std::vector<SomeType> SomeTypes;
typedef SomeTypes::iterator SomeTypesIt;
typedef SomeTypes::const_iterator SomeTypesConstIt;
// ... and so on

SomeTypes types;
// .. do stuff with the container
SomeTypesIt current( types.begin() );
const SomeTypesIt end( types.end() );
for (; current != end; ++current ) {
// whatever you're doing with the variable goes here
}

So far there's no benefit to using iterators over doing pointer arithmetic on a dereferenced begin(), or using the old-style array index for-loops from C. However, if you now change the container type to something that doesn't support random access, you've got a lot of code to rewrite all of a sudden. With iterators, that's not the case (the only exception being iterators which contain a pair, like from std::map, but in that case it's unclear anyway whether your for loop should be accessing the key or the value so it's necessary to change the logic). Not to mention that doing something like

for (int i(0); i<types.size(); ++i ) {
// do something with
types[i];
}

forces a lookup in types every single time for the correct variable (for vector this is probably optimized a great deal; but that doesn't mean the same can be said for any other container with random access iteration, particularly custom container classes written in-house or from 3rd party libraries).

The syntax is hideous, but technically so is the syntax of a C-style for loop if you write it correctly. That for loop you gave as a simpler way should technically be written like this:

for (vector<int>::size_type i=0; ... )

not with an automatic assumption that the range is safe to reference with an 'int.' So you've got to use the vector in the for loop anyway, regardless of whether or not you use iterators.

Anonymous said...

Template code is compiled in two phases. In the first phase the uninstantiated template code is being considered. So the compiler can't know if in Foo< T >::something, something is a type or a variable, hence the need for 'typename'

(I'm not defending C++ btw, just giving you some background)

But when writing template code like that, it's usually a better idea to maket the iterator itself the template parameter.
I.e. rather than typing:

template < class T >
T SomeFuntion(vector < T > & inVector)
{
for(typename std::vector< T >::iterator it(inVector.begin(); it != inVector.end(); ++it)
{
...
}
}

Write something like:

template < class T >
T SomeFunction(T const & inBegin, T const & inEnd)
{
for (T it(inBegin); it != inEnd; ++it)
{
...
}
}

And you're on the library writer's side and gain the portability of iterators ;)

ferruccio said...

I like to use the boost foreach macro. e.g. (assuming vec is a vector<int>)

BOOST_FOREACH (int i, vec) {
}

It's much clearer and will be easy to convert to the C++0x form:

for (int i : vec) {
}

whenever that actually becomes available.

Dennis said...

Thanks to all who commented! This solves a major problem I was having, and I learned something new. I thought I was a C++ "expert", but I had no idea you could use the typename keyword in that way.

Yes, C++ is a pain the ass sometimes, but my partner and I wouldn't be using it for our project (a game engine) if we didn't think its benefits outweighed the difficulties. :)