[In this reprinted #altdevblogaday in-depth piece, Gamer Camp's technical course lead Alex Darby continues his series on a C/C++ Low Level Curriculum by looking at conditional statements.]
Hello, interwebs! As the title suggests, this is the sixth part of the C / C++ Low Level Curriculum series I've been doing. In this installment, we'll be starting to look at conditional statements, and what the code that you're asking the compiler to generate when you use them looks like (at least before the optimizer gets to it…).
Just in case anyone is unclear about what they are, conditionals are the language features that allow us control over which parts of our code get executed. At face value, the subject of conditionals might seem a simple one, but it is precisely because it seems simple – and because so much else builds on top of it – that it is the first topic that I've chosen to look at in detail after function calls.
Though we won't get around to all of them in this post, our look at conditionals will take us on a tour through a representative sample of x86 disassembly generated by if
statements, the conditional operator
(or "ternary operator", or "question mark"), and switch
statements; and whilst we look at all of these we'll also be looking at disassembly generated by the (built in!) relational and logical operators that are used with them (i.e. ==, !=, <=, >=, >, <, !, &&, and ||).
Firstly, I'd like to apologize to anyone who reads these posts regularly for the fact that my rate of posting has slowed down – I will hopefully speed up again to the regular 2 week posting cycle in the near future.
Secondly, here are the backlinks for anyone who wants to start from the beginning of the series (warning: it might take you a while, the first few are quite long):
- A Low Level Curriculum for C and C++
- C / C++ Low Level Curriculum part 2: Data Types
- C / C++ Low Level Curriculum Part 3: The Stack
- C / C++ Low Level Curriculum: More Stack
- C / C++ Low Level Curriculum Part 5: Even More Stack
Generally I will try to avoid too much assumed knowledge; but if something comes up that I've explained previously, or that I know another ADBAD author has covered already then I will just link to it; this implies that you, dear reader, should assume that I assume you will read anything I link to if you want to make complete sense of the article :)
Compiling and running code from this article
I assume that you are using Windows, are familiar with the VS2010 IDE, and comfortable writing, running, and debugging C++ programs.
As with the previous posts in this series, I'm using a win32 console application made by the "new project" wizard in VS2010 with the default options (VS2010 express edition is fine).
The only change I make from the default project setup is to turn off "Basic Runtime Checks" to make the generated assembler more legible (and significantly faster…) see this previous post
for details on how to do this.
To run code from this article in a VS2010 project created this way, open the .cpp file that isn't stdafx.cpp
and replace everything in it with text copied and pasted from the code box.
The disassembly we look at is from the debug build configuration, which generates "vanilla" unoptimized win32 x86 code.
Instructions and Mnemonics: an aside
I've just realised that so far in this series I have typically been using the term instruction
when referring to an assembler mnemonic
I felt that I should point out that this isn't 100% accurate, because whilst assembler mnemonics are normally thought of as having a 1:1 correspondence to binary CPU instructions, they are not
In fact, in x86 assembler, the mnemonics often actually have a 1:x relationship with the corresponding opcodes, because multiple variants of each mnemonic exist that differ in the types and sizes of their operands.
This is not something you should worry yourself about too much, as it's a fairly harmless Kenobiism, but I still felt I should point it out if I was going to carry on doing it ;)
The best place to start is, as someone or other famously once remarked, at the beginning; so let's start with the most basic form of the if
Before anyone mentions it, I know I could have omitted the curly braces around iLocal = 1;
on line 9.
If you're the kind of person who's so lazy that you like to leave out curly braces in these situations then that's up to you; but I would just like to point out that there is probably a special place in one of the deeper and less pleasant circles of the Hell I don't believe in that is reserved for your sort – just a couple of floors up from those who do the same thing with loops.
Also, I've left the #inlcude "stdafx.h"
in the code box so that your line numbers match mine if you're working through this yourself.
int main(int argc, char* argv)
int iLocal = 0;
if( argc < 0 )
iLocal = 1;
Anyway, as usual if you're looking at this in VS2010 then copy and paste the above code over whichever is your project's main .cpp file, put a breakpoint on line 7
, tell Visual Studio to compile and run, wait for the breakpoint to be hit, then right click in the source window and choose "Go To Disassembly". You should now be seeing something like this:
n.b. right-click and check you have the same options checked as me...
As we already know
the assembler above int iLocal = 0;
is the function prologue (or preamble) and the assembler after the closing brace of main()
is function epilogue.
The specific disassembler we're interested in is between lines 7 and 13
of the source code that is shown inline with the disassembly, so here it is pasted into a code window (N.B. the addresses corresponding to the disassembly instructions will almost certainly differ on your screen if you're running this yourself…)
7: if( argc < 0 )
010D20B0 cmp dword ptr [argc],0
010D20B4 jge main+1Dh (10D20BDh)
9: iLocal = 1;
010D20B6 mov dword ptr [iLocal],1
12: return 0;
010D20BD xor eax,eax
Straight away, there are a couple of new assembler mnemonics we've not come across so far in this series of posts. We'll cover these as we come to them.
. The instruction cmp
doesn't have an instant effect on code execution, it compares its first and second operand and stores the result of the comparison in an internal register of the CPU known as EFLAGS
uses the mnemonic jge
, which means j
qual. It will cause a jump to the address 0x010D20BD
supplied as its operand if the outcome of the previous cmp
instruction has set the content of the EFLAGS
register to indicate that its first operand was greater than or equal to its second operand – i.e. if argc
is greater than or equal to 0
then execution will jump past the instructions generated by the block of code controlled by the if
Hold on a minute…
So, we've only covered the most basic form of an if
statement and we've already encountered a major difference between what we might think we're asking the compiler to do, and the code it's generating.
The intuitive way to think about an if
block in a high level language is that if the condition of the if
is met, then execution will step into the curly braces delimited block of code it controls.
However, the assembler is clearly testing the logical opposite of what we've asked it to, and if that condition is met then it is skipping over the code block controlled by the if
This is because, at the assembler level, instructions are executed in sequential order unless a jump instruction tells it to do otherwise – and so assembler has no equivalent to the high level concept of a curly brace delimited "code block". The upshot of this is that the high level notion of "stepping into" a code block is implemented at the assembler level by "not skipping over" the code the block has generated.
Clearly these two behaviors are logically isomorphic (i.e. produce the same output given the same input), but the high level version is easier for the human mind to cope with intuitively, and the version generated by the compiler better suits the sequential-execution-unless-tampered-with behavior of the underlying machine.
Just for the sake of clarity let's re-write the C++ code in a form that matches what the assembler we just looked at does, using the C++ keyword goto
int main(int argc, char* argv)
int iLocal = 0;
// corresponding original code in comments to the right...
if( argc >= 0 ) goto GreaterEqualZero; //if( argc < 0 )
iLocal = 1; // iLocal = 1;
Ironically (though unsurprisingly) this C++ code generates different assembler to the original code. Please don't worry about this.
if … else if … else
So let's take a look at a more complicated if
int main(int argc, char* argv)
int iLocal = 0;
if( argc == 0 )
iLocal = 13;
else if( argc != 42 )
iLocal = (6 * 9);
iLocal = 1066;
This code generates the following assembler, which given what we saw in the previous example is more or less exactly what you'd expect:
7: if( argc == 0 )
002020B0 cmp dword ptr [argc],0
002020B4 jne main+1Fh (2020BFh)
9: iLocal = 13;
002020B6 mov dword ptr [iLocal],0Dh
002020BD jmp main+35h (2020D5h)
11: else if( argc != 42 )
002020BF cmp dword ptr [argc],2Ah
002020C3 je main+2Eh (2020CEh)
13: iLocal = (6 * 9);
002020C5 mov dword ptr [iLocal],36h
002020CC jmp main+35h (2020D5h)
17: iLocal = 1066;
002020CE mov dword ptr [iLocal],42Ah
20: return 0;
002020D5 xor eax,eax
The main things to note about this code are:
- Each if and else if condition is implemented as a cmp followed by a jxx – there are two new ones in here: je (jump equal) and jne (jump not equal)
- As in the first example, each if and else if condition is causing the compiler to generate the logically opposite test to the high level language, and skipping the assembler generated by the controlled block of code if it succeeds
- The test for the first if jumps to the condition of the else if when its condition is not met. If there were more chained else if statements then this pattern would continue through them.
- Each block of code has an unconditional jmp at the end of it that takes the execution past the code block controlled by the else
That was all pretty straightforward for once. Joy.
Next, let's take a look at the effects of the &&
int main(int argc, char* argv)
int iLocal = 0;
if( ( argc >= 7 ) && ( argc <= 13 ) )
iLocal = 1024;
else if( argc || ( !argc ) || ( argc == 69 ) ) // deliberately nonsensical test
iLocal = 666;
This generates the following assembler, which is much more interesting than the first if … else if
7: if( ( argc >= 7 ) && ( argc <= 13 ) )
00F120B0 cmp dword ptr [argc],7
00F120B4 jl main+25h (0F120C5h)
00F120B6 cmp dword ptr [argc],0Dh
00F120BA jg main+25h (0F120C5h)
9: iLocal = 1024;
00F120BC mov dword ptr [iLocal],400h
00F120C3 jmp main+3Eh (0F120DEh)
11: else if( argc || ( !argc ) || ( argc == 69 ) )
00F120C5 cmp dword ptr [argc],0
00F120C9 jne main+37h (0F120D7h)
00F120CB cmp dword ptr [argc],0
00F120CF je main+37h (0F120D7h)
00F120D1 cmp dword ptr [argc],45h
00F120D5 jne main+3Eh (0F120DEh)
13: iLocal = 666;
00F120D7 mov dword ptr [iLocal],29Ah
16: return 0;
00F120DE xor eax,eax
Now, I don't know about you but the first time I saw assembler generated by using &&
I was amazed by the sheer simplistic audacity of it – I think it's because I'm not an assembler programmer, but I expected it to be a little more complicated and fiddly than this.
Looking in detail at the code generated for the if
statement using &&
(lines 2 to 5
), we can see that is using another two conditional jump instructions we've not yet seen: jl
ess) and jg
reater) and as before is testing the logically opposite condition to that specified by the high level code.
More interestingly, in order to implement &&
, the compiler simply concatenates the separate tests – if either of these tests fails they will cause execution to jump past the block of code controlled by the if
statement. This means that the block of code controlled by the if
will only be executed if both tests are passed, which clearly implements a logical AND.
If we now turn our attention to the code generated by the if
statement using ||
(lines 12 to 17
) we see a similar pattern of consecutive conditional tests, though clearly it must be different since it implements conditions joined by ||
The first thing to notice is that the first two tests done by the assembler are logically the same as their high level equivalents. This bucks the trend we have seen so far, but why?
Well, the address passed as operands to the conditional jumps on lines 13
will move execution past the rest of the tests, to the start of the controlled code block. Unsurprisingly though, the last test of the || if
statement (lines 16 & 17
) follows the standard test-the-opposite-and-jump-past idiom we've come to expect from an if
The jump-into-controlled-block behavior of all but the last ||
conditional means that as soon as any one of the tests is passed the controlled code will be executed, which clearly implements a logical OR.
Aside: Lazy Evaluation
I'm sure that most – if not all – of you will have heard that C++ has "lazy evaluation" of && and ||. If you've never been 100% sure of what this means, you've just seen it in action in this block of assembler!
The && will fail if either of its operands fails; so if the first test fails it will never do the second (or third, or fourth …).
Similarly the || will succeed if either of its operands succeeds; so if the first test passes it will never do the second (or third, or forth …).
Since neither necessarily evaluates all of its operands this makes them technically "lazy"; which in this circumstance you can read as awesome, elegant, and efficient (for certain definitions of efficient).
The main points to take away from the assembler we've looked at in this post are that:
- The conditional test that you see in the disassembly is likely to be the logical opposite of the test the high level code is asking for…
- …and the conditional jump will typically be jumping over the assembler that is generated by the "code block" controlled by the conditional in the the high level code.
- This is because there is no concept of a "code block" at the level of assembler.
More or less all control code boils down to various combinations of conditionals and jumps at the assembly level; and being familiar with the assembler mnemonics that are used to implement these C / C++ features, and the various ways that they are used will almost certainly prove invaluable when you find yourself in the unenviable situation of a crash deep within some library code that you don't have symbols for (or that your debugger can't find symbols for).
Incidentally if you find yourself lost in code that you should have symbols for but your machine refuses to find them, you might try this post
by Bruce Dawson to see if it helps ;)
Next time we'll continue looking at conditionals with the conditional
operator (also known as the "ternary operator" or more commonly the question mark), and the the switch
Also, thanks to Fabian and Bruce for giving this a once-over and offering sage advice on content.
I am pretty sure that the code in this article doesn't demonstrate all the relational operators; so I'm leaving it to you, dear reader, to try out the ones I left out to see what they do :)
I also avoided writing any conditions for the if
statements that contained function calls, clearly this will make the assembler generated by the test code significantly more complex and assuming that you have read the previous posts on the assembler generated when calling functions too you should be able to make sense of this by yourself. I have to admit that I also partly avoided doing this so I could steer clear of operator overloading. That's for later. Probably.
[This piece was reprinted from #AltDevBlogADay, a shared blog initiative started by @mike_acton devoted to giving game developers of all disciplines a place to motivate each other to write regularly about their personal game development passions.]
span style=span style=