This is a cross-post from my blog, Game Dev Without a Cause.
Any technically-inclined person on the net should be familiar with the acronym RTFM which stands for the phrase "Read the Freaking Manual" (the non-family-friendly version is more colorful, of course). It's the standard response to a question that you think wouldn't have been asked had the asker read the (freaking) manual in the first place.
For the programming set, a natural variant for RTFM is RTFC, "Read the Freaking Code". For any given program, the source code is, by definition, the most accurate description of how a given program will behave. Barring inaccurate comments and misleading variable names, source code is the only form of documentation guaranteed to be 100% accurate. What keeps source code from being great documentation is that it is so hard to read. It's so hard in fact that:
Most Programmers Can't Read Code
Programmers can't read code? But isn't that their job? Strange as it may seem, it's true and there are reasons for this.
Now that I have your attention, I'll take a moment to admit my cheekiness in the above statement. Almost any programmer can read code in the strictest sense, i.e. they can parse a given line of code and have a reasonable idea of what it will do. But, being able to understand large amounts of code, the designs and intentions implicit in how code is written and by whom, taking note of what code is not there, and being able to extrapolate on how the code will evolve over time, is a different story. These abilities can be unfortunately rare in many programming teams. In other words, most programmers have some facility in code reading, but they are actually deficient in code comprehension.
So, to refine my original statement:
Because most programmers have poor code comprehension,
Most Programmers Can't Read Code Effectively
Joel Spolsky of Joel on Software fame made a similar observation with this line from one of his posts:
It’s harder to read code than to write it.
So why is code so much harder to read than to write? There are a couple of reasons that I think are primarily responsible for this phenomenon.
The first reason code is harder to read than to write has to do with the sheer amount of data you need to keep in your head in order to read code. When you write code, you only need to remember the variables, algorithms, data, etc. relevant to the feature your are currently writing. On the other hand, when you read code, you need to keep data not just about the feature you are currently investigating but data about other potentially relevant functionality. In order to understand code well enough to leverage it, you need to build up a working model of the design in your head gleaned from clues hinted at by the code you read. While you write code, you can ignore exception and error cases with the expectation that you'll get to them once you get the core logic working. In most case, when reading code, all those exception and error cases are already implemented and embedded into the code that you're reading. Not only do you have to keep several times more stuff in your head when you're reading code versus writing it, but you also have to pretend that you're Sherlock Holmes as you try to deduce the intended design and usage of the code you see and fit it into a larger mental picture of the software project. It's no wonder so many programmers chicken out at the thought of understanding a large codebase and opt to just write their own. They essentially mind-trick themselves into thinking that they are too dumb to read code.
The other major factor that confounds reading code is pride. Programmer pride. Writing code is a surprisingly personal endeavor. When you spend a lot of time thinking deeply about a problem and crafting code to solve that problem, you can easily become emotionally attached to the code that you created. This can be advantageous because it encourages programmers to constantly refine their favorite code, but it also leads to programmers having territorial feelings over a particular problem domain and biases towards their own code. This sort of pride can make it extremely difficult for a programer to read someone else's implementation and fairly compare it with their own. Indeed, if they start reading code with the assumption that they won't like it, they will never be able to apply the brain power necessary to understand the code they are supposed to read.
My boss said something recently that spoke to this element of programmer pride and I'll paraphrase it here:
You can't look down on someone when you read their code. If you don't respect the person writing the code, you won't be able to apply the energy needed to understand it.
For many programmers, the toughest barrier preventing them from reading code may be saying to themselves: "The person who wrote this just might be smarter than I am."
There you have it. Reading code is hard because it can be both mentally and emotionally taxing. So that's the problem, now what can we do about it?
Honestly, I wish I knew. For recruiting, my best recommendation is to try to find ways to test for code comprehension. For folks you already work with, try to find the programmers who actively read code, are able to grok it, and are able to leverage that understanding so your team doesn't have to write code that has already been written. Once you find these guys, hold them up and encourage others to follow their example.
As for individual programmers, learn to read code. The skills needed to read code are not innate, they are learned and honed through practice. Don't be scared of trying to read code that seems inscrutable at first glance. With enough work, you will be able to understand it. Remember that you are smart enough to understand any code that comes before you, just not as smart as the guy who actually wrote it. The only way to get as smart as that guy is by reading his code.