[In this reprinted #altdevblogaday in-depth piece, Gamer Camp's Alex Darby continues his series on a C/C++ Low Level Curriculum by looking at the conditional operator and switch statements.]
Hello humans. Welcome to the seventh part of the C/C++ Low Level Curriculum series I've been writing. This post covers the conditional operator, and switch statements. As per usual I will be showing snippets of C++ code and throwing the corresponding x86 assembler at you (as produced by VS2010) to show you what your high level code is actually doing at the assembler level.
Disclaimer: in an ideal world, I'd like to try to avoid assumed knowledge, but keeping up the level of detail in each post that this entails is, frankly, too much work. Consequently, I will from now on point you at post 6 as a "how to" and then get on with it…
Here are the backlinks for preceding articles in the series (warning: it might take you a while, the first few are quite long):
- A Low Level Curriculum for C and C++
- C / C++ Low Level Curriculum part 2: Data Types
- C / C++ Low Level Curriculum Part 3: The Stack
- C / C++ Low Level Curriculum: More Stack
- C / C++ Low Level Curriculum Part 5: Even More Stack
- C / C++ Low Level Curriculum Part 6: Conditionals [see near the top of this post for details on compiling & running the code snippets]
#include "stdafx.h" int main(int argc, char* argv[]) { // the line after this comment is logically equivalent to the following line of code: // int iLocal; if( argc > 2 ){ iLocal = 3; }else{ iLocal = 7; } int iLocal = (argc > 1) ? 3 : 7; return 0; }If you remember the the assembler that a basic if-else generated in the last article, then the assembler generated here will probably bust your mind gaskets… Note:
- I've deliberately left the function prologue and epilogue out of the asm below, and just left the assembler involved with the conditional assignment
- if your disassembly view doesn't show the variable names, then you need to right click the window and check "Show Symbol Names"
5: int iLocal = (argc > 2) ? 3 : 7; 01311249 xor eax,eax 0131124B cmp dword ptr [argc],2 0131124F setle al 01311252 lea eax,[eax*4+3] 01311259 mov dword ptr [iLocal],eaxClearly this is not very much like the code for the simple if-else that we looked at previously. This is because there is trickery afoot and the compiler has chosen to do sneaky branchless code to implement the logic specified by the C++ code. So, let's examine it line by line:
- line 1 – uses the xor instruction to set eax to 0. Anything XORed with itself is 0.
- line 2 – as in the previous if examples this uses cmp to test the condition, setting flags in a special purpose CPU register based on the result of the comparison.
- line 3 – this is a new one! The instruction setless equal sets its operand to 1 if the 1st operand of the preceding cmp was less than or equal to the 2nd operand, and to 0 if it was greater. We've not seen the operand al before, it's a legacy (386) register name which now maps to the lowest byte of the eax register (if you're a sensible person and are stepping through this code in your debugger with the register window open, you will see that this instruction causes the eax register to be set to 1 – also note that this only works because eax has already been set to 0).
- line 4 – uses the load effective address instruction do do some sneaky maths that relies on the value of eax set by setle in line 3.
- line 5 – moves the value from eax into the memory address storing the value of iLocal
#include "stdafx.h" int main(int argc, char* argv[]) { int iOperandTwo = 3; int iOperandThree = 7; int iLocal = (argc > 2) ? iOperandTwo : iOperandThree; return 0; }And, here's the relevant disassembly:
5: int iOperandTwo = 3; 00CF1619 mov dword ptr [iOperandTwo],3 6: int iOperandThree = 7; 00CF1620 mov dword ptr [iOperandThree],7 7: int iLocal = (argc > 2) ? iOperandTwo : iOperandThree; 00CF1627 cmp dword ptr [argc],2 00CF162B jle main+25h (0CF1635h) 00CF162D mov eax,dword ptr [iOperandTwo] 00CF1630 mov dword ptr [ebp-50h],eax 00CF1633 jmp main+2Bh (0CF163Bh) 00CF1635 mov ecx,dword ptr [iOperandThree] 00CF1638 mov dword ptr [ebp-50h],ecx 00CF163B mov edx,dword ptr [ebp-50h] 00CF163E mov dword ptr [iLocal],edxSince the conditional operator is now assigning from variables we'd expect it to generate something that looks more like the sort of code we saw from the basic if-else we looked at last time, which it has. We have the expected cmp followed by a conditional jump testing against the opposite of the conditional, then two blocks of assembler, the first of which (lines 7 to 9) unconditionally jumps over the second (lines 10 and 11) if it executes, so essentially it's behaving more or less as expected; however there's clearly some interesting stuff happening in there:
- the two branches use different registers to store their intermediate values; the first uses eax, the second uses ecx
- both branches store their result to the same memory address in the Stack (see this post if you don't know or can't remember about Stack Frames) – i.e. [ebp-50h]
- the code that assigns the value to iLocal (lines 12 and 13) only exists once and is executed regardless of which branch was taken; it takes the value from[ebp-50h] and writes it into iLocal using uses a third register (edx)
// intuitively equivalent if-else of // int iLocal = (argc > 2 ) ? iOperandTwo : iOperandThree; int iLocal; if( argc > 2 ) { iLocal = iOperandTwo; } else { iLocal = iOperandThree; }Rather than choosing between one of two assignments like this if-else, the assembler generated for our use of the conditional operator does exactly what we told it to: choose one of two values (store it temporarily in the Stack) and assign iLocal from it. Switch Statements The final type of conditional statement we'll be looking at is the switch statement. Like the conditional operator, the switch statement is an often abused and maligned construct that you wouldn't want to live without. To be 100% fair to the switch statement it's never the fault of the switch statement that it's possible for maniacs to write brittle and insane code using them; I would like to say that the maniacs in question know who they are, but in fact it's pretty unlikely that they do know or they wouldn't do it… In any case, rest assured that whilst you may never know whether you are one of these maniacs, everyone else in your team will know whether you are because they've looked at your switch statements; and don't think using if-else if-else instead of switch will help you evade detection, because that'll just make it even more obvious ;) Anyway, sniping aside, the switch statement is particularly interesting because when used in certain ways it can produce some pretty cool assembler. So let's take a look at a switch statement…
#include "stdafx.h" int main(int argc, char* argv[]) { int iLocal = 0; // n.b. no "break" in case 1 so we can // see what "fall through" looks like switch( argc ) { case 1: iLocal = 6; case 3: iLocal = 7; break; case 5: iLocal = 8; break; default: iLocal = 9; break; } return 0; }And here's the disassembly…
9: switch( argc ) 00C61620 mov eax,dword ptr [argc] 00C61623 mov dword ptr [ebp-48h],eax 00C61626 cmp dword ptr [ebp-48h],1 00C6162A je main+2Ah (0C6163Ah) 00C6162C cmp dword ptr [ebp-48h],3 00C61630 je main+31h (0C61641h) 00C61632 cmp dword ptr [ebp-48h],5 00C61636 je main+3Ah (0C6164Ah) 00C61638 jmp main+43h (0C61653h) 10: { 11: case 1: 12: iLocal = 6; 00C6163A mov dword ptr [iLocal],6 13: case 3: 14: iLocal = 7; 00C61641 mov dword ptr [iLocal],7 15: break; 00C61648 jmp main+4Ah (0C6165Ah) 16: case 5: 17: iLocal = 8; 00C6164A mov dword ptr [iLocal],8 18: break; 00C61651 jmp main+4Ah (0C6165Ah) 19: default: 20: iLocal = 9; 00C61653 mov dword ptr [iLocal],9 21: break; 22: }This is more or less exactly what you'd expect:
- line 1 stores argc into the Stack at [ebp-48h]
- then block from line 2 to 9 implements the logic of the switch by a series of comparisons of this value against the constants specified in the case statements and associated conditional jumps to the assembler generated by the code in the corresponding case statement
- if none of the conditional jumps are triggered, the logic causes an unconditional jump to the default: case.
- in particular, note that:
If you look at assembler from the sample if-else-if-else in the last article; you should be able to see that the assembler generated for this switch is (more or less) what would happen if we had written the switch as an if-else-if-else and then re-organized the assembler so all the logic was in one place at the top, and the assembler generated for each code block was left where it was. So other than the fact that the switch statement is a very useful C/C++ language convenience for managing what would often otherwise be messy looking and error prone chains of if-else-if-else statements, based on this example it doesn't appear to be doing anything which might offer a significant advantage at the assembler level – so why would I have claimed that the compiler might generate "pretty cool assembler" for a switch? Before we assume we've seen it all, let's try using a contiguous range of values for the constants in the cases of the switch. You know, just for fun – and for the sake of simplicity let's start at 0.
- wherever the break keyword is used this causes an unconditional jump past the end of the assembler generated by the switch
- the "drop through" from case 1: into case 3: in the high level code happens at the assembler level as a by product of the organization of the adjacent blocks of instructions generated for the switch by the compiler, and the lack of unconditional jump at the end of the assembler for case 1:
#include "stdafx.h" int main(int argc, char* argv[]) { int iLocal = 0; switch( argc ) { case 0: iLocal = 4; break; case 1: iLocal = 5; break; case 2: iLocal = 6; break; case 3: iLocal = 7; break; } return 0; }And here's the disassembly it generates… Ok, so this time something more interesting is definitely going on – n.b. I've used a screenshot rather than just pasting the text because we need to look in a memory window to make sense of it. So what exactly is it doing?
- it moves argc into eax, then stores it into the Stack at [ebp-48h]
- it then compares the value stored in the address [ebp-48h] with 3 (i.e. our maximum case constant)
- if this value is greater than 3 then ja (jump above) on the next line will cause execution to jump to 8D1658h – the 1st instruction after the code generated by the case blocks, skipping the switch
- if the value is less than or equal to 3 then the value is moved into ecx, and we then have an unconditional jump to … somewhere :-/
jmp dword ptr (1B1664h)[ecx*4]This says "jump to the location stored in the memory address at an offset of 4 times the value of ecx from the memory address 8D1664h", so how is this implementing the logic of the C++ switch statement? To answer this question we need to look in a memory window at the address 8D1664h (n.b. to open a memory window from the menu in VS2010 when debugging go Debug -> Windows -> Memory -> … and choose one of the memory windows. To set the address just copy and paste it from the disassembly into the "Address:" input box. You will also need to right click and choose "4-byte integer" and set the "Columns:" list box to 1 to have it look like the screenshot above). So, if you cast your eyes up to the memory window on the left of the screenshot above, you will see that the top 4 rows are highlighted, these values start at address 8D1664h and are 4 byte integers (hence the ecx*4 in the operand) – which specifically in this case are pointers. The instruction jmp dword ptr (8D1664h)[ecx*4] will jump to the value stored in the address:
- 8D1664h + 0 = 8D1664h if the value in ecx is 0
- 8D1664h + 4 = 8D1668h if the value of ecx is 1
- 8D1664h + 8 = 8D166Ch if the value of ecx is 2
- 8D1664h + Ch = 8D1670h if the value of ecx is 3
- any time you see cmp followed by a jxx to a nearby address in the disassembly you're probably looking at code generated by a conditional statement in the C/C++ code
- if the address operand to the jump instruction is lower than the current instruction's address (i.e. it's jumping backwards) you're most likely looking at a loop
- assembler generated from conditionals generally tests the opposite of the test being done in the C / C++ code