[In this reprinted #altdevblogaday in-depth piece, Gamer Camp's Alex Darby continues his series on a C/C++ Low Level Curriculum by looking at ways that the Stack and registers are used to pass information around when functions are called.]
Welcome to the fifth installment of the series I'm doing on a C/C++ Low-Level Curriculum. This is the third post about the Stack, the fundamentals have been covered a couple of posts ago, and the previous post and this one are really just for extra information to round out the picture of ways the Stack is used in win32 x86 function calls -- then we can move on to other low level aspects of the C/C++ languages.
The last two (win32 x86) function calling conventions we're going to look at are thiscall which is used for calling non-static member functions of classes, and fastcall which emphasizes register use over stack use for parameters.
As with the previous posts about the Stack, the point of this isn't so much the specific calling conventions that we're examining, but rather to see the different ways that the Stack and registers are used to pass information around when functions are called.
Previously on #AltDevBlogADay…
If you missed the previous C/C++ Low Level Curriculum posts, here are some backlinks:
Make sure that the check boxes in your right-click context menu match those shown in this screenshot,
or your disassembly will not match mine!
The block of assembler that we're interested in for the purposes of illustrating how the thiscall convention works is shown below:
And in fact if you substitute that line it will generate exactly the same assembler – so how does that work?!?
Ok, so there's invisible pointer access in the C++ code, but that still doesn't explain what we're seeing – exactly how is
equivalent to
The answer has to do with memory layout of C++ classes (and structs), which is a topic for another entire article (probably several).
For now we'll keep the explanation simple whilst trying not to channel our friend Professor Kenobi more than absolutely necessary…
First let's take it as read that the member data for an instance of class must be stored somewhere in memory, and take a high level look at how the "pointing to" operator works with another code snippet:
This basically tells the compiler generate assembler that:
- A Low Level Curriculum for C and C++
- C / C++ Low Level Curriculum: Data Types
- C / C++ Low Level Curriculum: The Stack
- C / C++ Low Level Curriculum: More Stack
class CSumOf { public: int m_iSumOf; void SumOf( int iParamOne, int iParamTwo ) { m_iSumOf = iParamOne + iParamTwo; } }; int main( int argc, char** argv ) { int iValOne = 1; int iValTwo = 2; CSumOf cMySumOf; cMySumOf.SumOf( iValOne, iValTwo ); return 0; }Paste this into VS2010, and put a breakpoint on the line
cMySumOf.SumOf( iValOne, iValTwo );Run the debug build configuration; when the breakpoint is hit, right click and choose "Go To Disassembly", and you should see something like this (n.b. the addresses in the leftmost column of the disassembly will almost certainly differ):

or your disassembly will not match mine!
14: int iValOne = 1; 00EE1259 mov dword ptr [iValOne],1 15: int iValTwo = 2; 00EE1260 mov dword ptr [iValTwo],2 16: CSumOf cMySumOf; 17: cMySumOf.SumOf( iValOne, iValTwo ); 00EE1267 mov eax,dword ptr [iValTwo] 00EE126A push eax 00EE126B mov ecx,dword ptr [iValOne] 00EE126E push ecx 00EE126F lea ecx,[cMySumOf] 00EE1272 call CSumOf::SumOf (0EE112Ch)The assembler involved with calling CSumof::SumOf() starts at line 7 and goes to line 12. Lines 7 to 10 are pushing the parameters to the function onto the stack in reverse order of declaration, exactly as with the stdcall convention we looked at in the previous article. Line 11 is storing the address of cMySumOf in ecx using the instruction lea. If you right click and un-check "Show Symbol Names" you can see that lea is computing the address of cMySumOf given its offset from the ebx register. Line 12 is obviously calling the function. Stepping into the function call in the disassembly you should see the following: (not forgetting that we have to step through an additional jmp instruction before we get there because of VS2010 incremental linking – see approx. half way through this post for the details)
6: void SumOf( int iParamOne, int iParamTwo ) 7: { 00EE1280 push ebp 00EE1281 mov ebp,esp 00EE1283 sub esp,44h 00EE1286 push ebx 00EE1287 push esi 00EE1288 push edi 00EE1289 mov dword ptr [ebp-4],ecx 8: m_iSumOf = iParamOne + iParamTwo; 00EE128C mov eax,dword ptr [iParamOne] 00EE128F add eax,dword ptr [iParamTwo] 00EE1292 mov ecx,dword ptr [this] 00EE1295 mov dword ptr [ecx],eax 9: }The calling code stored the address of the calling instance of the local variable cMySumOf in the ecx register before calling this function, and if we examine line 9 in code box above, you can see that – compared to the stdcall assembler – the function prologue has an extra step – it is moving the value in ecx into a memory address within the function's stack frame (i.e. ebp-4). The upshot of this is that after line 9 [ebp-4] now stores the function's this pointer. The function then proceeds exactly as you might expect from the disassembly we've examined in previous articles up until line 13. Line 13 moves the this pointer (previously stored in the function's stack frame) into ecx, then line 14 stores the value of eax into the address specified by ecx (remember: in the VS2010 disassembly view, values in [square brackets] are memory accesses, taking the address to access from the value in the brackets). If you right click in the disassembly window and un-check "Show Symbol Names" you will see that the symbol this corresponds to ebp-4, which is where the value of ecx was stored at the end of the function prologue. The astute amongst you will have noticed that the assembler is storing the this pointer from ecx into the Stack only to get it re-load it into ecx later without having used the register in the intervening time. This is exactly the kind of odd thing that un-optimized compiler generated assembler will do, try not to let it bother you :) So the sum of the two parameters is stored using the this pointer, and then we hit the function prologue and the function returns; end of story – or is it? Nothing to see here. Move along. This is not what you might expect because – based on what we've seen so far – that assembler that is setting CSumOf::m_iSumOf in the member function doesn't obviously match the C++ code we wrote. What we're seeing looks like it might have been generated by the code
*((int*) this) = iParamOne + iParamTwo;
// Here's what we wrote. Since m_iSumOf is a class member the language syntax allows // us to "access it directly" (another Professor Kenobiism) in the member function m_iSumOf = iParamOne + iParamTwo; // in fact, what happens is that the compiler evaluates the code // as if it was written like this this->m_iSumOf = iParamOne + iParamTwo;
*((int*) this)
this->m_iSumOf
this->m_iSumOf = 0;
- gets the value of this (a memory address)
- looks up the offset of m_iSumOf relative to the start of the data needed by an instance of CSumOf (which is known at compile time, so it's constant at run time)
- adds the offset to the address of this to get the memory address storing m_iSumOf and then sets the value at the resulting memory address to 0
int __fastcall SumOf( int iParamOne, int iParamTwo, int iParamThree ) { int iLocal = iParamOne + iParamTwo + iParamThree; return iLocal; } int main( int argc, char** argv ) { int iValOne = 1; int iValTwo = 2; int iValThree = 4; int iResult = SumOf( iValOne, iValTwo, iValThree ); return 0; }This is basically the same as the code used in the previous post in the series to show how the stdcall calling convention stores multiple parameters on the stack, except the function SumOf has got an extra keyword between the return type and the name of the function. The __fastcall keyword is a not-quite Microsoft specific C++ extension that changes the calling convention used to call the function it is applied to (http://en.wikipedia.org/wiki/X86_calling_conventions#fastcall). If you follow the usual drill to make a runnable project from this snippet, put a breakpoint on line 12, then compile and run the debug configuration, wait for the breakpoint to get hit, and go to disassembly you should see something like this:
8: int main( int argc, char** argv ) 9: { 010F1280 push ebp 010F1281 mov ebp,esp 010F1283 sub esp,50h 010F1286 push ebx 010F1287 push esi 010F1288 push edi 10: int iValOne = 1; 010F1289 mov dword ptr [iValOne],1 11: int iValTwo = 2; 010F1290 mov dword ptr [iValTwo],2 12: int iValThree = 4; 010F1297 mov dword ptr [iValThree],4 13: int iResult = SumOf( iValOne, iValTwo, iValThree ); 010F129E mov eax,dword ptr [iValThree] 010F12A1 push eax 010F12A2 mov edx,dword ptr [iValTwo] 010F12A5 mov ecx,dword ptr [iValOne] 010F12A8 call SumOf (10F1136h) 010F12AD mov dword ptr [iResult],eax 14: return 0; 010F12B0 xor eax,eax 15: }You should by this point be pretty familiar with function prologues, and the assembler that precedes a function call in the other conventions we've examined, so we'll just look at the differences with __fastcall. Looking at lines 16 to 20, we can see that of the three parameters passed to SumOf():
- the 3rd (iValThree) is being pushed onto the stack,
- the 2nd (iValTwo) is being moved into the edx register, and
- the 1st (iValOne) is being moved into the ecx register
2: int __fastcall SumOf( int iParamOne, int iParamTwo, int iParamThree ) 3: { 010F1250 push ebp 010F1251 mov ebp,esp 010F1253 sub esp,4Ch 010F1256 push ebx 010F1257 push esi 010F1258 push edi 010F1259 mov dword ptr [ebp-8],edx 010F125C mov dword ptr [ebp-4],ecx 4: int iLocal = iParamOne + iParamTwo + iParamThree; 010F125F mov eax,dword ptr [ebp-4] 010F1262 add eax,dword ptr [ebp-8] 010F1265 add eax,dword ptr [ebp+8] 010F1268 mov dword ptr [ebp-0Ch],eax 5: return iLocal; 010F126B mov eax,dword ptr [ebp-0Ch] 6: }The assembly making up the function prologue is doing extra work compared to a stdcall function; taking the values of ecx and edx and storing them into the function's Stack frame (lines 9 & 10). Lines 12 to 14 then add the three values passed to it using eax – iParamOne (passed via ecx now in [ebp-4]), iParamTwo (passed via edx now in [ebp-8]), and iParamThree (passed via the Stack in [ebp+8]). Line 15 sets iLocal from the sum calculated in eax, and then Line 16 moves the return value of the function into eax where the calling code will expect to find it (as previous established in this post). That's all well and good, but how is fastcall faster than the alternative calling conventions? In theory, passing the arguments via registers should save two operations per parameter:
- not writing the value into the Stack (i.e. memory access) before the function is called, and
- not reading it from the Stack (i.e. memory access) when it is needed inside the function.