[In this reprinted #altdevblogaday in-depth piece, Gamer Camp's Alex Darby continues his series on a C/C++ Low Level Curriculum by looking at ways that the Stack and registers are used to pass information around when functions are called.]
Welcome to the fifth installment of the series I'm doing on a C/C++ Low-Level Curriculum. This is the third post about the Stack, the fundamentals have been covered a couple of posts ago, and the previous post and this one are really just for extra information to round out the picture of ways the Stack is used in win32 x86 function calls -- then we can move on to other low level aspects of the C/C++ languages.
The last two (win32 x86) function calling conventions we're going to look at are thiscall
which is used for calling non-static member functions of classes, and fastcall
which emphasizes register use over stack use for parameters.
As with the previous posts about the Stack, the point of this isn't so much the specific calling conventions that we're examining, but rather to see the different ways that the Stack and registers are used to pass information around when functions are called.
Previously on #AltDevBlogADay…
If you missed the previous C/C++ Low Level Curriculum posts, here are some backlinks:
- A Low Level Curriculum for C and C++
- C / C++ Low Level Curriculum: Data Types
- C / C++ Low Level Curriculum: The Stack
- C / C++ Low Level Curriculum: More Stack
Generally I will try to avoid too much assumed knowledge, but this post does assume that you have read the posts linked above as 3 and 4 (or have a working knowledge of how the Stack works in vanilla x86 assembler, in which case why are you reading this!?).
Compiling and running code from this article
I assume that you are familiar with the VS2010 IDE, and comfortable writing, running, and debugging C++ programs.
As with the previous posts in this series, I'm using a win32 console application made by the "new project" wizard in VS2010 with the default options (express edition is fine).
The only change I make from the default project setup is to turn off "Basic Runtime Checks" to make the generated assembler more legible (and significantly faster…) see this previous post
for details on how to do this.
To run code from this article in a VS2010 project created this way open the .cpp file that isn't stdafx.cpp
and replace everything below the line: #include "stdafx.h" with text copied and pasted from the code box.
The disassembly we look at is from the debug build configuration, which will generate "vanilla" unoptimized win32 x86 code.
The "thiscall" calling convention
As I'm sure you're aware, in any non-static class member function it is possible to access a pointer to the instance of the class that the function was called on via the C++ keyword this
The presence of the this
pointer is often explained away by saying that it is an invisible "0th parameter to member functions", which isn't necessarily incorrect but is
the same kind of truth that Obiwan Kenobi might have dealt in if he had been a computer science professor rather than a retired Jedi Knight; that is to say "true, from a certain point of view".
calling convention is more or less exactly the same as the stdcall
calling convention we have already looked at in some detail in the last two posts (this->pPrevious->pPrevious
Though it is the default calling convention used by the VS2010 compiler for non-static member functions, it's worth noting that there are situations where the compiler won't use it (e.g. if your function uses the ellipsis operator to take a variable number of arguments).
As we have seen in the last two posts; the unoptimized win32 x86 stdcall
calling convention passes its parameters on the Stack. The thiscall
convention obviously must somehow pass the this
pointer to member functions, but rather than storing an extra parameter on the Stack, it uses a register (ecx
) to pass it to the called function.
The code below demonstrates this…
void SumOf( int iParamOne, int iParamTwo )
m_iSumOf = iParamOne + iParamTwo;
int main( int argc, char** argv )
int iValOne = 1;
int iValTwo = 2;
cMySumOf.SumOf( iValOne, iValTwo );
Paste this into VS2010, and put a breakpoint on the line
cMySumOf.SumOf( iValOne, iValTwo );
Run the debug build configuration; when the breakpoint is hit, right click and choose "Go To Disassembly", and you should see something like this (n.b. the addresses in the leftmost column of the disassembly will almost certainly differ):
Make sure that the check boxes in your right-click context menu match those shown in this screenshot,
or your disassembly will not match mine!
The block of assembler that we're interested in for the purposes of illustrating how the thiscall
convention works is shown below:
14: int iValOne = 1;
00EE1259 mov dword ptr [iValOne],1
15: int iValTwo = 2;
00EE1260 mov dword ptr [iValTwo],2
16: CSumOf cMySumOf;
17: cMySumOf.SumOf( iValOne, iValTwo );
00EE1267 mov eax,dword ptr [iValTwo]
00EE126A push eax
00EE126B mov ecx,dword ptr [iValOne]
00EE126E push ecx
00EE126F lea ecx,[cMySumOf]
00EE1272 call CSumOf::SumOf (0EE112Ch)
The assembler involved with calling CSumof::SumOf()
starts at line 7
and goes to line 12
Lines 7 to 10
are pushing the parameters to the function onto the stack in reverse order of declaration, exactly as with the stdcall
convention we looked at in the previous article
is storing the address of cMySumOf
using the instruction lea
. If you right click and un-check "Show Symbol Names" you can see that lea
is computing the address of cMySumOf
given its offset from the ebx
is obviously calling the function.
Stepping into the function call in the disassembly you should see the following: (not forgetting that we have to step through an additional jmp instruction before we get there because of VS2010 incremental linking – see approx. half way through this post for the details)
6: void SumOf( int iParamOne, int iParamTwo )
00EE1280 push ebp
00EE1281 mov ebp,esp
00EE1283 sub esp,44h
00EE1286 push ebx
00EE1287 push esi
00EE1288 push edi
00EE1289 mov dword ptr [ebp-4],ecx
8: m_iSumOf = iParamOne + iParamTwo;
00EE128C mov eax,dword ptr [iParamOne]
00EE128F add eax,dword ptr [iParamTwo]
00EE1292 mov ecx,dword ptr [this]
00EE1295 mov dword ptr [ecx],eax
The calling code stored the address of the calling instance of the local variable cMySumOf
in the ecx
register before calling this function, and if we examine line 9
in code box above, you can see that – compared to the stdcall
assembler – the function prologue has an extra step – it is mov
ing the value in ecx
into a memory address within the function's stack frame (i.e. ebp-4
). The upshot of this is that after line 9
] now stores the function's this
The function then proceeds exactly as you might expect from the disassembly we've examined in previous articles up until line 13
Line 13 mov
es the this
pointer (previously stored in the function's stack frame) into ecx
, then line 14
stores the value of eax
into the address specified by ecx
(remember: in the VS2010 disassembly view, values in [
are memory accesses, taking the address to access from the value in the brackets).
If you right click in the disassembly window and un-check "Show Symbol Names" you will see that the symbol this
corresponds to ebp-4
, which is where the value of ecx
was stored at the end of the function prologue.
The astute amongst you will have noticed that the assembler is storing the this
pointer from ecx
into the Stack only to get it re-load it into ecx
later without having used the
register in the intervening time. This is exactly the kind of odd thing that un-optimized compiler generated assembler will do, try not to let it bother you :)
So the sum of the two parameters is stored using the this
pointer, and then we hit the function prologue and the function returns; end of story – or is it?
Nothing to see here. Move along.
This is not what you might expect because – based on what we've seen so far – that assembler that is setting CSumOf::m_iSumOf
in the member function doesn't obviously match the C++ code we wrote.
What we're seeing looks like it might have been generated by the code
*((int*) this) = iParamOne + iParamTwo;
And in fact if you substitute that line it will generate exactly the same assembler – so how does that work?!?
// Here's what we wrote. Since m_iSumOf is a class member the language syntax allows
// us to "access it directly" (another Professor Kenobiism) in the member function
m_iSumOf = iParamOne + iParamTwo;
// in fact, what happens is that the compiler evaluates the code
// as if it was written like this
this->m_iSumOf = iParamOne + iParamTwo;
Ok, so there's invisible pointer access in the C++ code, but that still doesn't explain what we're seeing – exactly how is
The answer has to do with memory layout of C++ classes (and structs), which is a topic for another entire article (probably several).
For now we'll keep the explanation simple whilst trying not to channel our friend Professor Kenobi more than absolutely necessary…
First let's take it as read that the member data for an instance of class must be stored somewhere in memory, and take a high level look at how the "pointing to" operator works with another code snippet:
This basically tells the compiler generate assembler that:
- gets the value of this (a memory address)
- looks up the offset of m_iSumOf relative to the start of the data needed by an instance of CSumOf (which is known at compile time, so it's constant at run time)
- adds the offset to the address of this to get the memory address storing m_iSumOf and then sets the value at the resulting memory address to 0
pointer holds the address of the first byte of the data in an instance of CSumOf
The first (and only) member variable in CSumOf
, which puts it at an offset of 0 relative to the this
pointer – and clearly even a debug build knows better than to add an offset of 0, so it accesses the memory at the address this
So, again, we can see that even in seemingly innoccuous everyday C++ code there is hidden stuff going on – which is a big part of why I'm doing this series :)
Incidentally, I have recently been made aware of an unbelievably useful (and undocumented!) feature of the VS2010 compiler which prints the memory layout of classes to the build output during compilation: here's the link I was sent, I hope you find it useful: http://thetweaker.wordpress.com/2010/11/07/d1reportallclasslayout-dumping-object-memory-layout/
fastcall (last one, I promise)
At last we come to the win32 x86 calling convention excitingly named fastcall
, so named because in theory it makes function calls faster (than the more common stdcall
So why is it faster than the other calling conventions that we've looked at? To answer this, we'll need to examine the assembler generated by a function call that uses the fastcall
To demonstrate this we'll use the code below:
int __fastcall SumOf( int iParamOne, int iParamTwo, int iParamThree )
int iLocal = iParamOne + iParamTwo + iParamThree;
int main( int argc, char** argv )
int iValOne = 1;
int iValTwo = 2;
int iValThree = 4;
int iResult = SumOf( iValOne, iValTwo, iValThree );
This is basically the same as the code used in the previous post
in the series to show how the stdcall
calling convention stores multiple parameters on the stack, except the function SumOf
has got an extra keyword between the return type and the name of the function.
keyword is a not-quite Microsoft specific C++ extension that changes the calling convention used to call the function it is applied to (http://en.wikipedia.org/wiki/X86_calling_conventions#fastcall
If you follow the usual drill to make a runnable project from this snippet, put a breakpoint on line 12
, then compile and run the debug configuration, wait for the breakpoint to get hit, and go to disassembly you should see something like this:
8: int main( int argc, char** argv )
010F1280 push ebp
010F1281 mov ebp,esp
010F1283 sub esp,50h
010F1286 push ebx
010F1287 push esi
010F1288 push edi
10: int iValOne = 1;
010F1289 mov dword ptr [iValOne],1
11: int iValTwo = 2;
010F1290 mov dword ptr [iValTwo],2
12: int iValThree = 4;
010F1297 mov dword ptr [iValThree],4
13: int iResult = SumOf( iValOne, iValTwo, iValThree );
010F129E mov eax,dword ptr [iValThree]
010F12A1 push eax
010F12A2 mov edx,dword ptr [iValTwo]
010F12A5 mov ecx,dword ptr [iValOne]
010F12A8 call SumOf (10F1136h)
010F12AD mov dword ptr [iResult],eax
14: return 0;
010F12B0 xor eax,eax
You should by this point be pretty familiar with function prologues, and the assembler that precedes a function call in the other conventions we've examined, so we'll just look at the differences with __fastcall
Looking at lines 16 to 20
, we can see that of the three parameters passed to SumOf()
- the 3rd (iValThree) is being pushed onto the stack,
- the 2nd (iValTwo) is being moved into the edx register, and
- the 1st (iValOne) is being moved into the ecx register
Stepping into the disassembly of SumOf()
you should see something like this (N.B. I unchecked "Show Symbol Names" before grabbing this text from the disassembly view so the addresses were all visible):
2: int __fastcall SumOf( int iParamOne, int iParamTwo, int iParamThree )
010F1250 push ebp
010F1251 mov ebp,esp
010F1253 sub esp,4Ch
010F1256 push ebx
010F1257 push esi
010F1258 push edi
010F1259 mov dword ptr [ebp-8],edx
010F125C mov dword ptr [ebp-4],ecx
4: int iLocal = iParamOne + iParamTwo + iParamThree;
010F125F mov eax,dword ptr [ebp-4]
010F1262 add eax,dword ptr [ebp-8]
010F1265 add eax,dword ptr [ebp+8]
010F1268 mov dword ptr [ebp-0Ch],eax
5: return iLocal;
010F126B mov eax,dword ptr [ebp-0Ch]
The assembly making up the function prologue is doing extra work compared to a stdcall
function; taking the values of ecx
and storing them into the function's Stack frame (lines 9 & 10
Lines 12 to 1
4 then add the three values passed to it using eax
– iParamOne (passed via ecx
now in [ebp-4
]), iParamTwo (passed via edx
now in [ebp-8
]), and iParamThree (passed via the Stack in [ebp+8
from the sum calculated in eax
, and then Line 16 mov
es the return value of the function into eax
where the calling code will expect to find it (as previous established in this post
That's all well and good, but how is fastcall
faster than the alternative calling conventions?
, passing the arguments via registers should save two operations per parameter:
- not writing the value into the Stack (i.e. memory access) before the function is called, and
- not reading it from the Stack (i.e. memory access) when it is needed inside the function.
As a rule of thumb, performing less operations and avoiding those that involve accessing memory should result in faster code, but this is not always the case. I don't want to get into discussing why this is, because on its own it is a subject for many posts and by someone more qualified than myself to explain (e.g. Bruce Dawson
, Mike Acton
, Tony Albrecht
, Jaymin Kessler
, or John McCutchan
In all honesty I would be extremely
surprised if the unoptimized code we've looked at runs any faster at all when using fastcall
. As you can see by examining the disassembly above, the first of these potentially saved operations is being un-done by pushing the content of ecx
onto the Stack in the function prologue, and the second is being un-done by accessing the parameter values from the Stack in lines 12 & 13
I assume that, like the other instances of unoptimized compiler generated assembler performing redundant operations we have come across, these unnecessary instructions would happily optimize away in a release build; however the sad fact is that it is pretty hard to test the disassembly of trivial programs like the one we've been looking at meaningfully in a release build configuration.
Why? because the optimizing compiler is so good that any simple program (like this one) which uses compile time constants for input, and does no output will pretty much compile to "return 0;"
I leave it as an exercise for you, dear reader, to work out the smallest number of changes to this code that will result in disassembly that actually calls SumOf() :)
So, we have now seen how thiscall
differ from the other x86 calling conventions we've looked at, and we have seen yet again that even in simple code there is black magic going on beyond the scenes of the language syntax.
No doubt we will revisit the Stack from time to time as this (Potentially never-ending! Help!) series of articles continues, but I've now covered it in as much detail as I feel is appropriate until we've covered some other aspects of the Low Level view of C/C++ (for example; we will definitely be coming back to the Stack when we examine structs & classes and their memory layout to discuss pass by value).
Next time we'll be looking at the disassembly from common C / C++ language constructs like loops and control statements, which are very useful things to be familiar with know if you find yourself staring at bunch of disassembly as a result of a crash in code you don't have symbols for..
In case you missed it whilst reading the main body of the post, here's that link again concerning the undocumented VS2010 compiler feature that dumps memory layouts of classes to the build output: http://thetweaker.wordpress.com/2010/11/07/d1reportallclasslayout-dumping-object-memory-layout/
Also, thanks to Fabian and Bruce for their help reviewing this post.
[This piece was reprinted from #AltDevBlogADay, a shared blog initiative started by @mike_acton devoted to giving game developers of all disciplines a place to motivate each other to write regularly about their personal game development passions.]