[In the fourth Microsoft-sponsored article on Gamasutra's XNA-themed microsite, XNA Developer Connection's Walbourn discusses the rise of 64-bit computing and gaming on Windows, explaining the technical specifics and programming advantages of 64-bit.]
“640K is more memory than anyone will ever need”
This famous, oft-quoted phrase -- and variations of it -- is routinely attributed to Bill Gates, founder and chairman of Microsoft Corporation. For the record, Gates flatly denies ever having said it. In fact, he has said the opposite.1 Still, it persists in technical urban legend as a warning against underestimating the pace of PC evolution. We’ve come a long way since the days of the Intel 8088 processor with its 8-bit data bus addressing one MB (the upper 384 KB of addressing space was reserved for expansion cards, leaving 640 KB for physical memory).
The laptop I’m using to write this article has 6,500 times more RAM than that. In fact, my mobile phone, my digital camera, my DVR, and a dozen other electronic devices I interact with every day have far more than 640 KB of memory. Of course, at the time of this alleged quote, in the early days of the PC industry, many home computers had 64 KB or less.
Like all engineering design efforts, PCs reflect a series of trade-offs. Designers make compromises to make the device easier to build and cheaper to mass produce. These compromises often provide some room for future growth, but often fall short of the true pace of innovation to minimize the impact on existing applications.
The PC’s greatest strength is its great weakness: the relentless drive for innovation and backward compatibility. It is a testament to the hard work of generations of engineers that we’ve had a long chain of backward-compatible products that bridge a 16-bit processor running at 4.77 MHz with 16 KB of RAM, CGA 4-bit color graphics, and 16 KB of Video RAM (VRAM) to the modern 64-bit processor running four or more CPUs at ~3 GHz, 4+ GB of RAM, one or more Direct3D10 class GPUs, and nearly a GB of VRAM.
We are now a few years into another major transition. Actually, two major processor transitions are happening at the time same. First, there is the move from single-core to multi-core processors. Modern gaming consoles like Microsoft Xbox 360® have already pushed game developers to embrace, if reluctantly, the transition from single-threaded to multi-threaded gaming. There is still a long way to go before we embrace the multi-core designs currently under development in the PC industry. At the same time, there is a second transition: these new multi-core processors appearing in our PCs are also 64-bit x64 processors.
The 4 GB Barrier: Physical Memory Limitations
The core of the memory limitation for a 32-bit processor comes from basic binary math:
232 = 4,294,967,296 bytes or 4 GB
This means a 32-bit processor can address only 4 GB worth of memory. However, 4 GB is extremely optimistic. Not every single address can be used for physical RAM. Much like the original IBM PC could address 640 KB of RAM, while the upper 384 KB was reserved for expansion cards, modern devices also need to use some of that addressing space. At boot time, the BIOS allocates addresses from the 0xc0000000 (3 GB) to 0Xffffffff (4 GB) range for installed devices.
Many of these devices are integrated into the motherboard. The mapping system is more flexible than it once was. Even so, everything from your audio device to your video card to the network interface needs to take away some of those address locations for its own purposes. This leaves the balance available for physical RAM. Because of this, the actual amount of physical memory accessible is something like 3 GB, 3.33 GB, or 3.5 GB, even when 4 GB or more physical RAM is installed.2 The amount of available physical RAM diminishes when you add more devices such as multiple video cards in a SLI® or Crossfire™ setup.
There have been clever solutions proposed to resolve this problem, such as Intel’s Physical Address Extensions, originally introduced in the Intel Pentium® Pro and also supported by AMD’s Athalon™ processors.3 PAE enables you to use 36-bit physical memory addressing, while you continue to use 32-bit addresses for the virtual memory system’s page tables. In theory, this allows the operating system to address the “hidden” memory while keeping everything intact as 32 bit. In practice, most Windows device drivers fail when the system is put into 36-bit PAE mode. This aspect of PAE is therefore useful only for controlled environments like servers and supercomputing.4
1 Katz J. (1997), “Did Gates Really Say 640k is Enough For Anyone?,” Wired.
2 Microsoft Knowledge Base article #929605 and Microsoft Knowledge Base article #946003.
3 “Intel® 64 and IA-32 Architectures Software Developer’s Manual,” Volume 3A, section 3.8, Intel.
4 “Physical Address Extension – PAE Memory and Windows,” Microsoft Corporation, MSDN Library.
The 2GB Limit: Virtual Memory Limitations
So far we’ve been talking about the limits of memory with respect to the system as a whole. However, a 32-bit processor also means the applications running on it are themselves 32-bit programs. Modern operating systems like Windows use virtual memory to enforce security and stability between running programs. As part of this implementation, the top half of the virtual address space is reserved for use by the operating system itself. This means standard 32-bit programs are limited to using half of the potential virtual address space:
231 = 2,147,483,648 bytes or 2 GB
The other half of the address space is used by the operating system kernel as privileged shared memory areas to allow fast switching to kernel mode without needing to reload memory mappings. This has some important and useful properties for programs. For example, the difference between any two memory locations allocated by the program is guaranteed to fit into a 32-bit signed number without under- or overflow (i.e., Standard C++ ptrdiff_t would not work in all cases for 32-bit programs if they had a full 4 GB address space).
Virtual address space is also used for more than just accessing physical memory. It is used for communication with hardware devices such as the graphics AGP/PCIe aperture, and for memory-mapped file I/O. When the typical PC had 512 MB or one GB of physical RAM, there was still plenty of room in the 2 GB of virtual address space for all these uses in most cases.
As memories got larger, 2 GB of virtual address space became a limiting factor. This was the case for data-intensive database and server applications when their files grew beyond a few MBs into multi-GBs or even Terabytes in size. Microsoft extended the Win32 API with the Address Windowing Extensions5 to enable applications to take advantage of the extra physical RAM made possible by 36-bit mode PAE.
Unfortunately, this could be enabled only under special circumstances. AWE enables a 32-bit application to use more than 2 GB of RAM by using the classic technique of “windowing,” but windowing doesn’t get around the fundamental limitations of the system. Because it is complex, windowing is difficult to manage and the extra memory areas cannot be paged to disk.
A more direct proposal was to give back the 32nd bit to applications. In principle, the concept is fairly simple: a 32-bit application is linked with the Large Address Aware flag (/LARGEADDRESSAWARE) to indicate it will work correctly if it is given memory addresses larger than 2 GB. Since all pointers in the application are already 32 bits, this is a simple condition to meet, assuming the program does not:
Use pointer arithmetic that can result in the difference failing to fit into a 32-bit signed integer, or make other assumptions that pointers are always less than 2 GB in size.
Override the meaning of the 32nd bit for other purposes, such as indicating a “handle” vs. a “pointer” (this technique is used by the DirectX SDK’s D3DX9 Effects library by default with the D3DXHANDLE type).
The real limitation of Large Address Aware (LAA) is that the operating system configuration has to support it. Standard Windows configurations assume the kernel has 2 GB of address space to work with. Special boot modes are required to constrain the kernel to less than that in order to allow LAA applications to have more than 2 GB of user address space. This is accomplished by the use of the /3GB switch or the /userva switch in Windows XP, or the IncreaseUserVa Boot Configuration Database element in Windows Vista.6
This will reserve up to 3 GB of address space for LAA applications, leaving the kernel with only 1 GB. Many combinations of drivers and software loads can result in out-of-memory conditions for the kernel in this constrained memory boot mode. It is a difficult solution to implement for your average gamer, and is better left for technically savvy IT managers to deploy in special circumstances. Again, this helps with the virtual address space limitations of 32-bit programs, but doesn’t get around the physical RAM limitations.
5 “Address Windowing Extensions,” Microsoft Corp, MSDN Library.
6 “4-Gigabyte Tuning,” Microsoft Corporation, MSDN Library.
The Growth of VRAM
Another factor in the PC memory equation has been growing as well: video memory size. In the early days of Direct3D, the typical video card had 16 or 32 MB of Video RAM (VRAM). High-end video cards now have 512 MB, 640 MB, 768 MB, or more VRAM. When video cards had 16 or 32 MB of Video RAM, this memory was mapped directly into every process that used Direct3D for efficient access by the application and video driver.
As video cards grew larger, this became unsustainable. A 768 MB hole in the 2-GB virtual address space of each process would leave very little space for applications. Similarly, taking 768 MB out of the 4 GB physical address space would be too constraining. This problem is exacerbated in dual GPU configurations (SLI®/Crossfire™).
Therefore, video card manufacturers typically implement a 256 MB physical memory window for the video graphics memory, and modern drivers do not create direct process mappings for the entire VRAM size. Process address space is still consumed for working with the AGP aperture (64 MB, 128 MB, or more typically on modern game systems 256 MB in size). While PCIe uses a dynamic aperture, it too is mapped into each process that uses Direct3D.
Beyond the direct impact of growing VRAM sizes, more process memory is needed to maintain the backing-store for handling “lost-device” situations so for textures, geometry, and other static data, filling up such large video cards and still fitting under the 2 GB limit is extremely challenging.
The Windows Vista Display Driver Model (WDDM) was designed to address the lost device limitations inherent to the Windows XP Display Model (XPDM), allowing more efficient sharing of the GPU by multiple applications. WDDM does not require the entire 256 MB aperture be mapped in to the process space.
Instead, it dynamically grows the amount as the VRAM allocated by the application increases. For Direct3D 10 applications, this eliminates the need to maintain two copies of resources like textures in memory, one for the VRAM and one in the backing-store for lost-device cases. The system deals with migrating the one copy between memory and the video card, as needed.
Unfortunately, to maintain application compatibility with Direct3D 9 running on Windows XP, two copies of managed resources were still maintained for Direct3D 9 applications running on Windows Vista. This also required process space for maintaining resources like render frame buffers. Previously, render frame buffers were simply lost and recreated. Therefore, they did not require any process space under WDM.
Since the elimination of the 256 MB aperture returned 256 MB of virtual address space to applications on WDDM that was already budgeted for under WDM, this change did not cause any problems until video cards with more than 256 MB of VRAM became available. Small games still had plenty of that 2 GB address space available, but many modern AAA PC titles were running out of space on WDDM.7 The WDDM VA hotfix (KB940105 for Windows Vista, included in Service Pack 1) 8 gives a bit of breathing room by only mapping video resources into the process that need direct CPU access. Games that use Direct3D 10 also have decreased memory pressure without the extra copies required for “device lost” handling.
However, high-end games in development are routinely hitting the 2 GB wall even on Windows XP. In fact, this incident proves that many modern AAA PC titles are already within 256 MB of the 2 GB barrier. Otherwise, they would not have hit this problem until video cards were over 512 MB.
7 “Why Your Windows Game Won't Run In 2,147,352,576 Bytes,” Gamefest 2007 presentation.
8 Microsoft Knowledge Base article #940105.
64-bit processors have been around for many years. They first appeared in supercomputer designs, then workstations and servers, and now are widely available. The Xbox 360’s PowerPC is a 64-bit processor, but has a 32-bit process model. Many well-known CPUs are 64-bit processors, but none of them were successful enough to displace the broad adoption of the x86 PC.
The x64 64-bit processor design directly addresses backward compatibility concerns by extending the existing x86 architecture, allowing it to run x86 32-bit binaries without emulation and providing a new 64-bit memory model.
First introduced by AMD and known alternatively as “x86-64“ and “AMD64,” it has since been adopted by Intel as “EM64T” or “Intel 64.” Now it is known more generically as “x64.” The vast majority of desktop CPUs and most laptop CPUs sold in recent years are all x64 capable processors. They fully support both pure 32-bit and mixed 32-bit/64-bit applications.
With the introduction of a new architecture, some improvements were made to the 64-bit execution mode beyond expanding the registers and addressing from 32 bit to 64 bit. These include eight more general purpose registers, eight more SSE registers, an NX no-execute bit (which otherwise requires the use of PAE to access), new instructions, and guaranteed SSE2 or later support.
A few old system programming features of x86 were declared “legacy.” While they are still available in the current hardware to support x86 32-bit applications, they are not available in 64-bit execution mode.9 Current implementations support 1 TB of physical memory, a limit that can easily be increased in the future. The ample physical addressing capability completely eliminates the concerns about the “hidden memory” problem.
Supporting the 64-bit execution mode requires a new version of the operating system. Windows now comes in two flavors:
“x86” which runs on legacy CPUs as well as in 32-bit mode on x64 processors
“x64” which runs on x64 processors providing 32-bit and 64-bit program support
Current Windows x64 versions (Windows XP Pro X64 Edition, Windows Server 2003, Windows Server 2008, and Windows Vista) support up to 44 bits of virtual memory addressing. Each application is given a 43-bit address space which is 8 TB, or 4,096 times larger than the process address space under 32-bit Windows. Future versions of Windows can easily extend this to even greater levels.
One of the benefits of being able to run 64-bit native versions of applications is obviously a huge increase in memory capacity, but there are other added benefits. Because the kernel itself is running in 64-bit mode, it can easily accommodate the desires of a 32-bit Large Address Aware (LAA) application.
As such, 32-bit programs linked with the Large Address Aware flag are able to allocate a full 4 GB of user mode address space when running on Windows x64. There is no special boot mode required or system stability impact that would require careful configuration. This makes LAA a highly attractive, consumer-friendly way of giving 32-bit processes more virtual address space on Windows x64.
Windows XP Pro x64 Edition, released in April 2005, was an early adopter consumer OS. The Windows-on-Windows-64 (WOW64) system works well for running 32-bit applications on the x64 version of Windows. The OS still required a new generation of 64-bit native kernel-mode drivers, as kernel-mode code must use the native memory model of the system. Existing programs ran into problems on the new OS in three ways:
Older installer packages still included 16-bit code from the Windows 3.1 era, and this code is not supported when running in 64-bit execution mode.
Applications that used kernel-mode drivers, such as copy protection schemes common in games, needed to provide 64-bit native Authenticode signed versions of these components.
The new OS sometimes exposed bugs in poorly written installers that made assumptions about directory paths, invalid path characters, or ran afoul of WOW64’s legacy registry handling.
The release of Windows Vista brings x64 into the mainstream. Now, all editions of Windows Vista are available in both x86 and x64 versions, and all the Windows Vista logo programs push support for Windows x64. These programs help drive the whole Windows ecosystem to support x64 versions of Windows. In particular, they provide third-party, 64-bit device drivers. The lack of broad driver support was the main reason for the limited distribution of Windows XP Pro x64 Edition.
9 “AMD64 Architecture Programmer’s Manual,” AMD.
In the near term, vendors and software developers must support both x86 and x64 versions of Windows. For application developers, this means making sure that their 32-bit applications work correctly on Windows x64 under WOW64. Game developers can take advantage of huge increase in memory support for content creation, servers, and development stations.
As with all technology transitions, customer demand tends to drive many companies to support Windows x64. Having said that, it is no longer “early adopter hell.” I personally run Windows XP Pro X64 and Windows Vista x64 on my development systems, Windows Vista x64 on my laptop, and Windows Vista x64 at home for my gaming machines.
Based on the download statistics for KB940105, and driver packages from major video hardware vendors, we estimate that a quarter of the hardcore gamer market that adopted Windows Vista is running 64-bit versions.
Market realities, however, dictate that most games need to be playable on 32-bit systems for the coming years. Therefore, making sure your 32-bit game runs well on both x86 and x64 versions of Windows is a basic shipping criterion. Just as with previous OS transitions, Windows x64 is a better development environment. It is a safer bet that a 32-bit program that runs well on Windows x64 will run perfectly fine on x86 versions than vice versa.
Transitioning to Windows x64 also greatly expands your options. You can introduce 64-bit native versions of tools into your pipeline. Also, LAA 32-bit tools get maximum memory benefits. The Visual Studio linker, for example, is an LAA application. For large monolithic executables, this can mean the difference between being able to enable Whole Program Optimization and failing to link.
A 64-bit version of a level editor or 3D modeler can enable you to work on whole levels without having to deal with time-consuming “chunking” workarounds. Running a 64-bit operating system means that you can install 8 GB or 16 GB of physical memory, instead of being limited to less than 4 GB. This means you can load huge data sets without paging. Often, the disk cache will keep all of the files you are working on in RAM.
Basic compatibility with Windows x64 for your 32-bit applications and moving your internal systems over is a good first step. The real value for customers will come from taking advantage of Windows x64. Some showcase games have already provided 64-bit native versions of their game. However, not every studio has the option of creating, testing, and optimizing two distinct versions of their title.
One solution is to take advantage of Large Address Aware (LAA) in your game to allow scaling content beyond the 2 GB Limit, while keeping to a single 32-bit executable. The highest-detail settings would be available only when you run the game on Windows x64, or when you use additional memory for larger resource caches to improve the overall player experience, based on a call to GlobalMemoryStatusEx.10
For networked games with a server infrastructure, Windows x64 64-bit native server programs can take advantage of both the extra physical RAM support and the greatly expanded virtual memory space for more aggressive memory-mapped I/O usage. Making your server 64-bit native enhances the development team’s expertise in 64-bit programming. At the same time, it reduces the impact on the content team, and running 64-bit native executables can greatly improve the performance and stability of your servers running in the datacenter.
Once games demonstrate better content running on 64-bit versions of Windows, the business case for making a 64-bit native version of the game becomes more realistic. Smaller games that fit comfortably into the 2 GB limits of 32-bit processes have no particular need for 64-bit native versions in the short term. On the other hand, AAA titles are already transitioning to 64-bit native executables. In some cases, these are purely internal builds to better handle debugging, unoptimized content, and rapid iteration. In the long run, it will make sense to ship these to customers.
Shipping 64-bit native applications will become the norm as 32-bit systems are retired or upgraded. Even many laptops are x64 capable. In fact, today’s gaming laptops often come with 4 GB or more RAM. While the use of 64-bit pointers can increase an application’s memory footprint somewhat, structure packing and optimization can bring a 64-bit application in line with an equivalent 32-bit application.
Such x64 native applications gain the performance benefits of extra registers. They can assume SSE2 to transition away from the less efficient x87 FP stack model. They also eliminate the fairly small marshaling costs of Win32 system calls on Windows x64 bypassing WOW64. Future development efforts of processors, compilers, and other tools will shift from being 32-bit focused to 64-bit. At that point, memory addressability is no longer the primary driving reason to move to x64 native applications.
There are two main challenges to the future of x64 native games beyond the traditional issue of market penetration and adoption: third-party “middleware” solutions and tools and the use of deprecated technologies.
10 “64-bit programming for Game Developers,” Specifying Large-Address-Aware When Building, DirectX SDK Documentation.
Third-Party Middleware Solutions and Tools
Most commercial games shipped include some form of third-party, open source, or shared source code. Therefore, it is important that providers of these solutions engage early on in supporting Windows x64. Failing to support x64, they risk blocking developers from starting the transition as it would otherwise make sense for their titles and technology projects.
Having 32-bit solutions that are compatible with Windows x64 is a market necessity today to avoid alienating users of 64-bit technology—which should eventually be anyone with 4 GB or more of RAM in their PCs. These solutions also need to support 64-bit native development. This includes copy-protection schemes, rendering libraries, game engines, audio libraries, animation systems, networking libraries, compilers, profilers, debuggers, 3D modeling tools, photo editing tools, and so forth. As customers of these products, game developers have a strong influence on how quickly these libraries and tools support 64-bit native development.
Use of Deprecated Technologies
Game developers are extremely busy professionals. Typically, a successful solution won’t be revisited until it breaks. Engines and libraries tend to get reused, often beyond the life of a single project. This means many modern games are still using older versions of DirectX components, Windows APIs, 16-bit code installer bootstrappers, or other technologies that may have been deprecated for many years.
In the transition from 32-bit to 64-bit execution mode, many of these older technologies were not carried forward. While such code works on Windows x64 under WOW64 as a 32-bit application, it will fail to compile as a 64-bit native application because the APIs it is using are not available for 64-bit native applications. Porting such code bases to 64-bit native requires updating to newer versions of the deprecated API. In some cases, developers need to rewrite the functionality using modern facilities.
Embrace the Future
The transition to 64-bit computing is like many technology transitions before: it requires some additional work and new learning. Nevertheless, it offers exciting new opportunities and potential.
Game developers can prepare for this not-too-distant future by learning about x64 technology, transitioning their code bases away from deprecated technologies not available to 64-bit native programs, taking advantage of tools porting efforts to prepare their code bases for the move to 64-bit native development, and learning about x64 optimization.11
Internal efforts now to gain x64 experience and to push on providers of applications, tools, and third-party libraries to proactively support x64 compatibility and x64 native development will help pave a smoother road into the future that is already upon us.
DirectX SDK Technical Article - “64-bit Programming for Game Developers”
Microsoft Visual C++ Developer Center – 64-bit Programming
DirectX SDK Technical Article - “64-bit Programming for Game Developers”
Microsoft Visual C++ Developer Center – 64-bit Programming
11 “64-bit programming for Game Developers,” Porting Applications to 64-Bit Platforms, DirectX SDK Documentation.