Of Internet Servers and SQL Databases: Designing the Backend for Power and Performance

As online games proliferate, a growing number of Internet servers are being created and deployed by game developers. These servers must deal with massive quantities of data generated by the games that connect to them. Off-the-shelf SQL database software seems like the ideal way to handle all this data, but which RDBMS system is right for your project, how will it impact server performance, and how should the data be structured and used? This feature will answer these questions and many others. It is intended to give programmers a clear and practical understanding of the things databases do well and the things they do horribly. Using a database without this knowledge is a virtual guarantee of lousy server performance. Topics include server architecture, ODBC vs. proprietary connection APIs, keeping the query optimizer from killing you, and everything you never wanted to know about designing tables, indices, and stored procedures to make your servers fly.

Pete Hallenberg, Blogger

September 16, 2002

45 Min Read

As online games proliferate, a growing number of Internet servers are being created and deployed by game developers. These servers must deal with massive quantities of data generated by the games that connect to them. Off-the-shelf SQL database software seems like the ideal way to handle all this data, but which RDBMS system is right for your project, how will it impact server performance, and how should the data be structured and used? This session will answer these questions and many others. It is intended to give programmers a clear and practical understanding of the things databases do well and the things they do horribly. Using a database without this knowledge is a virtual guarantee of lousy server performance. Topics include server architecture, ODBC vs. proprietary connection APIs, keeping the query optimizer from killing you, and everything you never wanted to know about designing tables, indexes, and stored procedures to make your servers fly.

Background: the ITNet Network

The paper that follows grew out of my experiences of the last three years during which I served as architect, lead programmer, and project manager for Incredible Technologies Internet gaming network ITNet. ITNet is a gaming network unlike any other: it provides Internet gaming services to the company's flagship arcade product – the blockbuster coin-op golf game Golden Tee Fore! Instead of interconnecting individual players, ITNet links public gaming devices that are themselves accessed by hundreds of individual players. This along with the phenomenal success of Golden Tee Fore means that the ITNet system handles a truly staggering amount of data: over 20,000 online machines worldwide dump 150,000 individual transactions per day into the system. ITNet provides service around the clock to games in 7 foreign countries and the United States, as well as tracking individual game play statistics for over 700,000 registered players worldwide. In addition to player statistics, ITNet manages the exchange of complete financial and game play statistics, multiple national and international tournament leader boards, user-defined option settings, collection and diagnostic data, recorded “shadow games” for multi-machine play, operator registration information, card purchase data, detailed course statistics, code patches, advertising graphics and promotional contests, as well as allowing for the sending, receiving, and deleting of any file on the remote machine's hard drive. In addition, plans are already underway to add two more arcade products to the ITNet network. Since I will be constantly referring to ITNet throughout the remainder of this paper, this description is intended to give you some idea of the capabilities of the system I will be discussing.

Do You Need a SQL Database?

If you are in the process of designing or building an Internet game server of any size, you have probably wondered whether you need a database. Considering the amount of data that a modern Internet server must deal with, integrating an enterprise-grade SQL database into your server backend becomes a more and more attractive option all the time. Before you go rushing off to buy that shiny copy of Oracle 9i, though, let's pause a moment and clarify when a SQL database is really necessary.

There are basically three options you can take with respect to databases. You can skip the database entirely and create a “stateless” server, you can roll your own database solution, or you can buy database software off the shelf and integrate it with your server. For the most part, I recommend you go with the first or third options. If your project is particularly small and simple, you may be able to get away without any database at all. This type of architecture can work well for a simple matchmaking and chat server that won't be tracking information about individual players. If you can get away with building a stateless server, I highly recommend you give it a try. The Internet server I created for Incredible Technologies Golden Tee PC Golf game used this architecture, and it's one of the most trouble-free server systems I've ever seen. It quite literally requires attention once or twice a year.

If, on the other hand, your game is going to require some amount of state data, then I recommend you go with an off the shelf SQL database rather than a home-brewed solution. Although it might seem like the home-brewed solution could be made cheaper and faster, you will rapidly discover that this approach is more trouble than it's worth. The minute you need to do anything halfway interesting with your data – even something as simple as accessing it from an external viewing or reporting tool, you'll wish you'd gone with SQL. In addition to remarkably fast performance, an enterprise-grade SQL database will store your data in a format that can be easily accessed and used by a wealth of external tools and systems – from accounting packages to web servers. Even if you never do anything other than issue queries interactively, you'll rapidly grow to love the flexibility and power of SQL. In addition, you'll be able to hire experienced help from a huge pool of potential applicants thanks to the ubiquitous use of the SQL standard in business.

Physical Infrastructure

Once you've decided you need a SQL database, setting the system up is pretty straightforward. Your SQL database will run on its own box and will be accessed via a fast LAN link by your Internet servers, web server, development tools, etc. Since you are going to want extremely fast network speeds between your database and critical Internet server components, you should carefully choose the physical networking infrastructure that will connect these machines. A dedicated, high-speed Ethernet backplane is a good choice here (we use 100 Mbps Ethernet for ITNet).

When purchasing computer hardware, I recommend that you buy the biggest and buffest box you can afford for your database server. We discovered after running ITNet for 2 years that the database is almost always the speed bottleneck when your Internet servers slow down. Our main database server is currently running on a duel Pentium IV machine with 1 Gig. of RAM, and 5 of the largest and fastest SCSI hard drives (connected and mirrored via a RAID array) that we could find.

Internet Server Design

Your servers will be the conduit between your game clients connecting over the Internet and your database, so your first design questions are: how should your clients connect to the servers and how should the servers connect to the database?

Connecting to The Database
There are two basic ways of connecting to a database. You can either use the Open Database Connectivity (ODBC) API, or connect using your database vendor's proprietary access API. ODBC is generally considered to be slightly slower, but has the advantage of being a universal standard on all major databases. The major advantage of a native API over ODBC is it's purported speed improvement, and a richer set of database access calls.

We used ODBC for ITNet and our query performance has exceeded all our expectations. I strongly recommend you use it. In practice, you'll find that you have other much more terrifying performance problems to worry about than latency introduced by ODBC.

Connecting to The Games
The first and most important choice you will face when architecting your servers is whether you should use TCP/IP or UDP as the underlying communications protocol.

If you've attended any server design lectures in the past, you may have heard it said that you should always use UDP for games. This is true as long as fastest possible delivery of data is the overriding design consideration for your project. In games where players are competing head-to-head over the Internet in real time, this is clearly the case. In turn-based games or other types of server-to-game interactions (like sending patch files), however, things are not so clear cut. I'm not suggesting you make your next multiplayer quake-killer using TCP/IP, but we need to be careful not to throw the baby out with the bath water here. For applications where it's appropriate, TCP/IP can cut months off a development schedule by providing a foundation protocol with world-class reliable delivery, sliding window packet delivery, flow control, retransmission characteristics, and countless esoteric refinements and optimizations. After all, the last thing you want to be doing while staring down the muzzle of a milestone is trying to work out your own solution for send-side silly window avoidance.

For all the reasons noted above, I decided to use TCP/IP as the underlying protocol when architecting my Internet server for Incredible Technologies Golden Tee PC Golf game. PC Golf was a turn based game, so it was a perfect fit for TCP/IP. This worked great except for one problem: with TCP you're much farther from the metal than you are with UDP, so you don't have as much information about what's going on when the network starts acting up. With PC Golf, my strategy had been to open TCP connections between the host and server and keep these connections open throughout a multiplayer match. What I discovered, though, was that it's really difficult to recover a lost TCP connection and figure out how to pick up where the last connection left off.

When it came time to design ITNet we had another great fit for TCP/IP on our hands, but I was wary of the problems I'd seen with TCP while creating the PC Golf multiplayer code. Again I faced a difficult choice: should we use UDP and reinvent all those expensive wheels, or should we somehow try to work around the problems of TCP? The solution we came up with turned out to one of the best architectural decisions we made while creating ITNet. After several grueling design meetings, we decided to use TCP/IP with a twist: instead of holding a TCP connection open during an entire communication session, we would adopt the same brief connection model used by web servers. Our design specified that ITNet clients would be required to open a new TCP/IP connection for every message they needed to send, and that each connection would stay open only until the server had sent one or more messages in reply.

The design worked great. The brief connection strategy neatly solved the connection recovery problem with TCP/IP: if a connection went south, it was very easy for the client to recover by simply throwing away the connection in progress, and then opening a new connection and resending the last message. All those thorny issues of determining how much data had already been sent and building servers that could resume processing at any point in the message handling pipeline simply went away.

The only thing that bothered us about brief connection was the fear that all those connects and disconnects would kill our data transfer efficiency. Happily, this was a non-issue. As it turns out, TCP/IP connection setup and teardown are actually pretty speedy affairs: opening a connection requires only 3 IP message sends, and closing requires only 4. The fast connection setup and teardown allowed us to make 50 to 60 brief connections per call and still blow away all our targets for call speed.

Brief connection was one of the very best choices we made when designing the ITNet servers. All told, it probably cut 3 or 4 months off our development schedule, and has continued to work exceptionally well even after 2 years of nonstop growth. Based on my experience, TCP/IP with brief connection has earned a permanent place in my network programmers bag of tricks. If it fits your project's parameters, I strongly urge you to give it a try.

The Standard Server Architecture
Now that you know how to establish basic connections to the database and your games, the next step is to settle on an overall architecture for the server itself.

Important Safety Tip: At Incredible Technologies, we write all our servers in C and C++ to run under Windows NT. Since servers don't need much of a user interface, I recommend avoiding MFC so that you don't add unnecessary overhead. Also, any serious server application will need to be written as a Windows NT service. A service is a successor to the venerable DOS TSR. One of the most useful properties of a service is that it can be set to run every time your NT server boots up - even when no user is logged in. This is very important for robustness: if anything should cause your server PC to reboot, you can sleep well knowing your server software will be restarted right along with it.

When I originally sat down to design the ITNet servers, I had planned to use a variant of the classical multithreaded server architecture. In this architecture, a dedicated thread waits for incoming connections and spawns a new thread for each client that connects to the server. Each spawned thread is responsible for handling all communication with a single client and is shut down when that client disconnects.

The variant on this design uses a small pool of pre-allocated “worker” threads that can handle any request/reply message transaction with any client. When a new message arrives from a game, it's handed off to an idle worker thread for processing. This architecture improves on the classical design by reducing the amount of context switching on a busy server, as well as eliminating the need to constantly create and destroy threads. Another nice feature of this architecture is that Windows NT has native support for it through an object called an I/O Completion Port. A full discussion of I/O Completion Ports is beyond the scope of this paper, but you can find a good overview of the topic in the MSDN document “Writing Windows NT Server Applications in MFC Using I/O Completion Ports” by Ruediger Asche (it should be included in your Visual Studio 6.0 help).

This multithreaded pooling architecture was the design I had used for the PC Golf server, and although it was very fast, it did have some problems. In particular, it took a ton of work to resolve all the thread synchronization bugs that cropped up. Also, having all the server's code in a single process made the server extremely fragile: any bug in any part of the code could bring the entire server crashing down. This made evolving the existing code or adding new features exceptionally hazardous… sort of like performing open heart surgery on a patient who's awake and walking around.

A New Hope
It was while mulling over these problems that we hit upon an alternative scheme. The idea was to unbundle all the pieces of a standard multithreaded server into a family of single-threaded servers that could each handle a single, specialized task. Figuring out how to chop up one big server into a lot of little servers was actually pretty easy. Most of the servers would be dedicated to processing specific messages or groups of messages. For example, we could have one server that handled incoming game data, and another that maintained and distributed tournament leader boards. The glue binding all these servers together would be a “meta” server, which would be responsible for telling clients where to send individual messages to get them processed. Clients would know only the IP address of the meta server, but would begin each communication session by requesting a “server resolution map” to get the addresses of the other servers. A server resolution map is just a look-up table that matches every client message type with an IP address and port number of the server that processes that message.

In the end, we decided to use this second “server family” architecture for ITNet. In addition to solving the multithreading and fault-tolerance problems, the server family approach had a number of very cool properties:

A single server could only ever be as powerful as the PC it was running on, but a server family could be spread across multiple machines for greater performance.
The meta server could dynamically create server resolution maps from data in the database. This meant that new servers could be added dynamically to a family, and also that we could override server assignments on a client-by-client basis (useful for sending specific messages from just one client to a debug server, for example).
By designing the meta to randomly route messages between two or more identical servers in the same family we could build in automatic load-balancing for free.

This was our reasoning, at any rate, when we embarked on the creation of ITNet. After running the ITNet server family under heavy load for two years, though, my opinion of the server family architecture is mixed. It did many of the things that we had hoped – and even surprised us with some unexpected benefits – but it also had its share of problems. The following sections explain what worked and what didn't work.

The Good
One of the biggest wins with the server family approach was being able to selectively move parts of the “virtual server” around to different computers on the network. This made debugging in the production environment incredibly easy: we could run a problematic server right from our development machines under the debugger when hunting bugs. The beauty of this approach was that the bulk of the “virtual server” continued to function exactly as it always did, which was a huge help in tracking down code that misbehaved only when the servers were running under load.

Being able to add new servers at will was also hugely beneficial. This capability came in especially handy when adding new products to ITNet. The architecture allowed us to easily add a new product-specific server to the server family which contained all the code under development for the new product. Even when this code was incredibly fragile and buggy, it never effected the stable parts of ITNet: since it was running as a separate process under a protected memory OS, it could crash all day long without impacting the other servers.

Another big boon was the meta server. Having a dedicated “traffic cop” managing the other servers let us add a lot of advanced behavior very easily. It also gave us a natural place to put code that handled global connection properties. The meta server eventually evolved into a session and call management server as well as a place to get server resolution maps.

There was one last benefit of the server family approach that really took me by surprise. Having many small servers broken down by project and function gave us an unexpectedly powerful tool for managing the production process. It turned out that making individual programmers responsible for specific servers neatly solved many of the problems that come with shared programming projects. In particular, when an ITNet server crashed, it was immediately obvious who's code had failed and who should be looking into it. Up until this time, I hadn't realized how much energy could be spent wrangling over who should do the work of finding certain tricky bugs. This ambiguity is unavoidable in a typical programming project, but it completely disappeared with ITNet. The best part about this was the way it protected careful programmers from mistakes made by other members of the team. Programmers who caught most of their bugs during coding just naturally ended up doing less nasty debugging work. The efficiencies this added to the server creation process are hard to overstate. Indeed, this is such a shockingly valuable feature that it almost justifies using the server family architecture all by itself!

The Bad
Although the idea of having a server that could “gracefully degrade” sounded good on paper, in practice it didn't buy us much. It didn't take long for us to realize that the problems caused by allowing partial calls to complete more than outweighed the benefit of having a system that could function with one or more failed servers. In the end, it was much less trouble to just detect and deal with downed servers than to try to build a system that could run robustly with “partially successful” communication sessions.

Managing the dozen or so servers that eventually formed part of the ITNet server family required more work than expected. Even checking the status of the system meant looking at and digesting 12 separate status values instead of just one. While this problem wasn't a deal-breaker, it was nonetheless a pain in the neck.

Data sharing between the servers turned out to be a bigger problem than originally anticipated. In particular, we found that the database was a very unsuitable channel for transferring shared state between different servers.

By far the biggest problem, though, was the single-threaded design of the individual servers. Without multiple threads to spread out the load, individual servers could easily build up a backlog of unprocessed messages when sudden spikes in server activity occurred. The only solution to this was essentially to hand-tune the servers: we ended up shifting message processing responsibilities around among servers until each one was carrying a roughly equal share of the processing load. This enabled the server family to pipeline data efficiently, but it involved a lot of tedious hand tweaking. Automatic load balancing might have helped alleviate the problem, which brings us to our next section…

The Ugly
Somehow or other, the automatic load balancing feature never quite got finished. Although it worked early on, something broke about six months after rollout and I never got around to fixing it. By that time, we were in full-blown crisis mode, and only the most urgent projects were being worked on. Since we never actually started using the load balancing feature, it went straight to the bottom of the project list and has stayed there ever since.

Database Design

The first question that must be answered when setting up an enterprise-grade database is what software you will use to run it. This is a complicated decision influenced by many factors, but I'll try to distill it down to the basic questions.

The major products to choose from are Microsoft's SQL Server, Sybase's Adaptive Server Enterprise, Oracle's 9i Server, and IBM's DB2 software. Any of these enterprise-grade Relational Database Management Systems (RDBMSs) will do the job, however some will do it better than others. I am by no means an expert on all these different packages, so I would urge you to do your own research on any package you're considering. I can say from experience, though, that migrating databases is a nightmare, so you'll probably end up living with whatever software you choose for a long time to come.

If your company is a Windows-only shop, your best bet is probably SQL Server. Sybase's ASE product also runs well under Windows NT and is significantly cheaper, however I advise you to steer clear of it. We went with Adaptive Server Enterprise over SQL Server and have regretted it ever since. The problem with ASE isn't performance – we've tweaked and tuned the heck out of our database and it fairly screams - the problem is the lack of tools and utilities bundled with it. SQL Server includes a ton of tools to help you monitor the health of your database, do regular maintenance, and tune its performance. With Sybase, you end up building or buying nearly everything you need just to perform routine tasks. This is more than just a painful chore: as newcomers to the world of enterprise-grade databases, we had no idea what tools were even required to run our shop. It was a long and painful process to first figure out what we needed, then look for features in ASE that supported it, and finally determine how to buy or build it ourselves. With SQL Server, the most important tools are ready and waiting for you right in the box.

If your company is a Unix shop, then you probably should take a long, hard look at Oracle. Although this is the most difficult database to setup and keep running (you probably don't want to attempt it without hiring a trained Oracle SA and/or DBA), it's also generally considered to be the fastest and most scalable RDBMS out there. Other databases will run under Linux or some other flavor of Unix, but Oracle is the best of the lot.

If you're not strongly biased towards a particular OS, then make your decision based on cost and how data-intensive your game will be. All of these RDBMS products will require ongoing upkeep, but some are significantly cheaper than others. SQL Server is at the low-end of the continuing cost scale: it requires less work than most to keep running, and there are plenty of qualified (and relatively inexpensive) professionals around who can help. Oracle is at the high end of the scale, requiring more effort from generally more expensive employees. One the plus side, you'll have no trouble finding experienced candidates. Sybase and DB2 fall somewhere in the middle in terms of cost. Be forewarned that since Sybase's market share has slipped into the low single-digits it can be very difficult to find potential employees with solid Sybase experience. We ended up hiring professionals with related experience (SQL Server) and training them ourselves.

In terms of determining how much database horsepower your game will need, try to get a sense for how much database activity will go on during an average client/server session and how many games will be connected to your servers simultaneously. If you're building a Massively Multiplayer Online game, you're probably going to need a high-end solution like Oracle. If you're building a matchmaking and chat server for a simple PC game, SQL Server or some other low-end RDBMS will probably give you plenty of horsepower – and be cheaper and easier to use too. These factors are intensely game-specific, though, so you'll need to make this call based on your own intimate knowledge of your game.

A Server Programmer in Dataland
As a longtime C/C++ programmer, I started work on ITNet with some very naïve attitudes about databases. I thought I could just treat the database as a “black box” that was capable of storing and retrieving data, and that I wouldn't have to concern myself with the details of how that data was represented internally. We had one database programmer at the time, and I imagined that he and I could just split the ITNet server project neatly at the database/server interface: I would figure out what the servers needed, and he would work his magic behind the scenes to make the database comply.

It didn't take long for reality to disabuse me of this happy fantasy. I quickly realized that databases could, indeed, serve up data however I wanted it provided I was willing to pay a steep performance price. The ugly truth is that if you want your servers to run quickly and reliably with a database (and you want to avoid driving your DBA to drink) you need to learn something about how databases work. More to the point, you need to write your server code to take advantage of a database's strengths and accommodate its weaknesses.

The rest of this paper is an attempt to pass on the most important bits of information that server programmers needs to know when working with databases. My hope is that you will be able to use this information to cut short the multi-year learning curve I had to climb when building ITNet. I will not attempt to cover all of the most basic database concepts here since these can be filled in by any reasonably decent database book. Instead, I will focus on the things that server programmers really need to pay attention to, and that can only be learned through experience.

Tools of The Trade
The first thing I want to impress upon you when working with an enterprise-grade database is that you need to get yourself a good database book. Experienced programmers won't need to be told this, but even they may be surprised at just how hamstrung they will be if they don't pick up a good book right at the start of a project. We use the book “Sybase SQL Server 11 Unleashed” by Rankins, Garbus, Solomon, and McEwan, and I can vouch for the fact that it is excellent. Make sure you buy a book that is specific to the RDBMS software you will be using.

A standard tool suite for working with databases consists of the following: a database management app (usually provided with the database software) for creating users, tables, indexes, and other database components, a text editor for writing stored procedures, and in interactive SQL interpreter for issuing SQL queries to the database and viewing the results. You might add some special purpose apps to this list later on, but this is the standard set of development tools. Some or all of these components may come with your database software, however if you don't get a good Windows-based SQL interpreter (as we didn't with Sybase's ASE), check out the WinSQL 3.5 shareware package on the Internet.

The Two Basic Table Types
One of the first things I learned about working with databases is that there are really only two effective strategies for storing data in a table. In C++ you can create all sorts of crazy data structures and manipulate them any way we like, but databases are very different beasts. The two storage strategies that work best are what I call historical and summary tables.

A historical table is one that is designed to add a new record every time data arrives that needs to be saved. A good example of this is the GameUnitPlayEvent (GUPE) table at the heart of the ITNet system: GUPE stores a single record for every individual game that is played, and includes detailed financial and game play data about that game. Whenever data about a new game is sent to ITNet, another record is added to GUPE to hold it. I call this type of table “historical” because it contains a complete history of all data that has ever been added to the table. This type of table can be used to store actual histories of various kinds. For example, we use a historical table called CallSession that gets a new record added every time a game initiates a connection to the meta server. CallSession stores the ID number of the game, and the time and date when the connection occurred, so it is literally possible to get a “history” of all calls a game has ever made to ITNet by requesting all records from this table with the game's ID number sorted by the date when the call occurred.

A summary table holds a limited number of rows of data that are designed to be updated with different values when new data is saved to the table. A good example of this sort of table is ITNet's CourseStats table. CourseStats holds statistics for every golf course available in the game, such as the length of the longest drive ever hit on the course, the average number of strokes to hole out, and so on. Whenever data about a new game arrives, the record for the course the game was played on is updated. For example, if the player beat the longest drive on that course, the CourseStat record is updated to show the new longest drive distance and the name of the new record holder.

Notice that both the historical GUPE table and the summary CourseStats table are updated with data from the same server event: a message describing a new game play. The moral of the story is that the same data can usually be stored in multiple ways in the database, so the table design choices you make must reflect how you plan to access and use the data.

Speaking of table design choices, just what are the advantages and disadvantages of the two table types? Broadly speaking, historical tables are good at providing very fast data inserts and maximum querying flexibility. The disadvantages of historical tables are that they can grow at a terrifying rate, and getting data out of them can take a long time (because they usually have so many records to hunt through). Summary tables, on the other hand, have a very small memory footprint, and are usually fast and easy to get data out of. The downside is that they offer much slower data inserts and are very inflexible when it comes to querying data.

Here are some rules of thumb to keep in mind when working with historical tables:

Keep the amount of data stored in each record as small as possible. This will help your query times immensely by limiting the amount of I/O the database needs to perform when sifting through table rows.
Even a massive historical table can still provide fast query times when retrieving a single row of data. You will need to have a proper index in place to make this work, but it can come in very handy.
With fast growing tables, you will need a strategy for preventing them from growing arbitrarily large. A good way to do this is to run a batch job to periodically back up and delete old data from the table.
Consider adding an insertion date and time with every record you put in a historical table. This can be an invaluable debugging tool, and also lets you batch process new data added to the table quickly and cleanly. I now do this with every historical table I create as a matter of course.

Here are a few rules of thumb to keep in mind when working with summary tables:

Think long and hard about what values you need to track in each column of the table. Once you start collecting data, you can't go back and change your mind.
You can save off periodic “snapshots” of the records in a summary table if you need some limited historical behavior. This works especially well for periodic events (for example, you could record all the course records at the start of a tournament if you wanted “before and after” snapshots).
Save as much data per record as you want since you're usually not space-limited, but be aware that “insert” times will go up with each additional value that needs to be updated.
Watch for accumulators that may exceed the maximum value that can be saved in the column's data type. This is especially important when you have an integer column that is accumulating values that can change in large increments.

Knowing the type of table your servers are dealing with will help you work with it effectively. For ITNet, we use a mix of the two table types. I'd set the proportion at about 70 percent historical tables and 30 percent summary tables. The preponderance of historical tables reflects the fact that slow-growing historical table can often give you a “best of both worlds” solution: all the detail of a historical table without the data bloat and excessive query times of a summary table.

How Big is Big?
A question that plagued me for a long time when I first started working with databases is just what constitutes a “big” table? You'll often hear people say that this or that operation doesn't work well on “big” tables, but exactly how many rows are we talking about here? On ITNet, we have just under 200 tables, ranging in size from 1 to 60 million rows. I've summarized my experiences working with these table to give you a better sense of how size impacts the utility of a table:

Indexes are Your Friend
Although indexes are well covered by most database books, there are a couple of important concepts that bear clarification. First and foremost, be aware that you will not be able to do anything quickly on a Medium size or larger table unless you have at least one index on it. The only exception to this is inserting new data rows. Indexes are what makes it possible to do anything useful with a really big table.

All tables can be given a special index called a primary key. The primary key has many uses, but it's main purpose is to define the column (or columns) that will principally be used as the “key” when retrieving records from the table. As such, the primary key must be unique for each record in the table. A good example of a primary key would be a unique player ID number that you might assign to all players of your game. The reason I bring up primary keys is that there is a very common trick used by database programmers to assign a unique ID number to all records in a table. This trick involves generating a random key value for each record in the table at insertion time. This random – but unique – value is then used to identify the record in all other tables that reference it. While this technique is great for tables that don't have a naturally occurring unique value as part of their data, I don't recommend using it as a one-size-fits-all solution. You will find that some database programmers have strongly held “religious” beliefs that only synthetic data should be used in a primary key. The argument is that real data is subject to change at some later date, so only made up data (which is under the complete control of the database programmer) should only ever be part of the key.

This argument is, to put it bluntly, baloney. Using a unique value from a table's data as the primary key has some important advantages – notably that the information actually means something when you see it stored as a reference in another table. It can also prevent unnecessary database work in some situations in which the unique data is readily available, but must first be converted to the synthetic primary key value via a table lookup before it can be used. As long as the unique data you have earmarked for the primary key is fully under your control (as in the case of player ID numbers) or unlikely to ever change (as in the case of, say, social security numbers) there is much to be gained from using it in this way.

Indexes are often used as a way of guaranteeing uniqueness in a table. In ITNet's GUPE table, for example, we have a multi-column unique index that covers several columns which, taken as a group, are always different from one game to the next. We use this index not to look up records, but as a way of preventing duplicate game records from being inserted into the table. This is an excellent (albeit somewhat expensive) trick.

Although indexes are great, you should keep in mind that every index you add to a table increases the amount of time it takes to insert new data into the table. This is because when you insert a row, every index in a table must be updated to include the data in the new row sorted in its proper place. Also, indexes can take up a lot of space. Indexes on really big tables, in particular, are really big. Remember that adding indexes incurs a resource cost, so you need to carefully weigh the value of every index before you add it to a table.

The Pros and Cons of Normalized Data
Another topic that can stir up strong opinions among database professionals is data normalization. Put simply, normalizing data is about representing data in it's most compact form, and ensuring that every value in the database exists in one and only one place. The best way to explain normalization is through an example. Imagine that you're saving player data in a database, and you want to allow each players to associate up to two email addresses with their data. The simplest way to do this would be to add two text columns to your standard player data table that already contains their name, address, phone number, etc. If a player entered only one email address, you could just put a NULL value in the second email address column. While this sounds like a perfectly acceptable solution, there are many database programmers who will strongly object to the scheme. The problem is that the data is not normalized properly. A normalized solution would create a separate two-column table to hold the email data: the first column would be the primary key value of the player record that the data belongs to (to allow you to associate the record with the proper player), and the second column would hold a single email address. Now if a player enters a single email address they will have only a single record in the email table. A player that enters two email addresses will have two records in the table. Database programmers often prefer the normalized approach because it more closely matches the table structure to the data and prevents wasted space (in the above example, the often empty second email address column in the player data table).

The problem with normalization is that it isn't always the best solution. Although it clearly models the data much more accurately, it also incurs a processing penalty: in the above example, whenever you want to retrieve a player's information you will be forced to join the player table with the email table to gain access to all the data. In the non-normalized approach, all that's needed is a single lookup in the player table and you're done. A normalized scheme can also cause table contention bottlenecks when multiple processes need access to the same data at the same time. Although redundancy is potentially bad in terms of data integrity, it can be a critical technique for eliminating resource contention bottlenecks on busy servers.

Although there are many such trades offs in when working with databases, this particular issue can be a real bone of contention with database programmers. From what I've seen, there appear to be a large number of database professional who look at normalization as a philosophical rather than a practical issue. To many, normalization is essential to the “purity” and “elegance” of the overall database design. What's important to take away from this, is that the cultures of database and C programmers are different and can easily clash. It took me a long time to feel confident enough to stand up to a database programmer when he or she insisted that a solution was the “only right way” to solve a problem, but it turned out to be an essential part of making ITNet work well.

Writing Stored Procedures for Fun and Profit
Stored procedures are truly wonderful things, and are essential for making your servers run quickly. As any good database book will tell you, a stored procedure (or SP) is basically just a script that runs on the database. You can issue any of the SQL queries that you might normally run interactively from within a stored procedure. You can also use variables, temporary tables, looping, branching, and other sophisticated constructs to coax complex behavior from a stored procedure.

Stored procedures provide a speed boost in two ways. First, they allow you to embed logic on the database that can keep you from having to send data back and forth over the network every time you need to make a decision. If you want to update a value in a table based on some value in a second table, for example, you can examine the second value from within the stored procedure and issue the correct update query on the spot. The only way to perform this kind of thing without stored procedures is to first request the value from the second table. Wait for it to arrive back at the server. Examine the value. And finally, issue a second query to update the first table. Stored procedures save you all that up and back.

The second way stored procedures help you is by pre-processing all the SQL queries you plan to use before the SP is invoked. In the normal case, when a SQL query string is sent to a database for execution, the database must first parse the string, check its validity, and hand it off to a component called the query optimizer to figure out the best way to execute it. The optimizer generates something called a query plan which describes the steps the database engine must take to satisfy the query. Obviously, performing all these extra steps can add significantly to a query's execution time. The amount of time it takes to run the optimizer is unnoticeable to a human operator, however it can add up for a query that gets executed over and over.

I advise you to make liberal use of stored procedures when designing your server. In addition to the performance benefits, stored procedures give you a terrific way to change your server's behavior without having to recompile and deploy new server code. ITNet makes heavy use of stored procedures for both of these reasons.

As good as stored procedures are, there are a few things you need to take into account when using them. Perhaps the most important is that there's no easy way to debug a stored procedure. We C/C++ programmers have gotten spoiled on sophisticated debugging tools that let us single step, watch variables, etc. Stored procedures will welcome you back to the bad old days of debugging by “printf” statement. The combination of a powerful scripting language and non-existent debugging tools means you can get yourself into a lot of trouble if you try to make your stored procedures too long or tricky. Your best bet is to stick with stored procedures that are reasonably short and simple. Most of ITNet's stored procedures are less than 500 lines in length. The few times we've risked longer and more complicated SPs we've been disappointed on several fronts. Not only were the SP's hard to write and debug, they seemed to run less satisfactorily against the database than their shorter brethren. If I had to guess, I'd say modern database engines are optimized for running short to medium length stored procedures. Since that's what the tools make easiest to write, my advise is to just go with the flow.

While on the topic of stored procedures, its worth it to pause for a moment and talk about the query optimizer. While this piece of code does an absolutely remarkable job most of the time, be aware that it will sometimes make disastrously bad choices about the best way to execute a query. Under Sybase ASE, there is a command you can issue in an interactive SQL window to output a text version of the query plan. By examining this output, you can get an idea of what decisions the optimizer has made. Not surprisingly, the more complicated the query (particularly when multiple table joins are involved) the less reliable the optimizer. Over time, as you get better and better at designing queries, the process of query optimizing will begin to reduce to figuring out how to trick the query optimizer into not making bone headed decisions.

There are two techniques that I've come to rely on heavily to work around the optimizer. The first involves modifying the query to contain a hint to the optimizer about what index should be used to resolve a specific query. I'm not sure if the syntax for this is specific to Sybase databases, so I'll refer you to your database manuals for the instructions on how to do this. You should also read the cautionary information about overriding the query optimizer's index selections that you will find there. Unless you know what you're doing, you're as likely to bring a query to its knees as speed it up when using index hints.

The other technique is much simpler and safer: break a query down into several smaller queries that are issued one after another instead of all at once. In theory, issuing a bunch of small queries can involve more work than would be needed to resolve a single big query. In practice, however, spoon feeding the optimizer is often the only thing that lets a query run in a reasonable amount of time at all.

Batch Processing, Shadow Tables, Caching, and Other Tricks of the Trade
When we initially created the specs for ITNet, we had great plans for how we would improve over the old ITS system. One of the things we were especially keen to add were tournament leader boards that could update the instant a new game arrived that affected the standings. Because we wanted leader board data to be available on the web site, our plan was to keep this data in the database where it could be accessed by any external program that cared about it.

Over the course of the next year as ITNet got busier and busier, we went through three major overhauls of the leader board code. Our first attempt used a heavily optimized stored procedure to sort and display messages straight out of our massive game data table GUPE. When this failed due to load, we created a set of tables that attempted to speed up the process by working on a smaller subset of data. Six months later, we were right back where we started from due to additional load from all the new games that had come online. Our final attempt was a heavily optimized set of tables that kept most of the data needed to calculate the leader boards on the database, but relied on the ITNet servers to cache certain pieces of information on the server for faster processing. Every hour, the server would write the updated data to the leader board tables to keep the database in sync. In the end, the system provided instantaneous updating of leader boards for games connecting to ITNet, but allowed the leader board data in the database to lag as much as an hour behind the most up-to-date version. The system worked, and is still in use to this day.

The lesson we learned while creating our leader board system is an important truth about databases: instant gratification is expensive. Designing a database involves a seemingly endless series of tradeoffs and compromises. What the leader board experience taught us was that you have to trade away a lot in terms of performance and development time if you want your database to give you instantly updating views of complex data.

The good thing that came out of the experience, though was that it forced us to learn three of the most powerful techniques for improving a database's performance: batch processing, shadow tables, and server-side caching.

Batch processing is a tried and true method for reducing the processing load on a database. The basic idea is that instead of trying to keep some view of the data instantly updated, you run a task at regular intervals that updates the view to the current state. Note that by “view” here I'm not talking about the database object of the same name. Instead, I'm talking about a snapshot of the data, which is usually kept in a special table for quick access. This external table is what, in ITNet parlance, we call a “shadow” table: a non-authoritative copy of the data that is maintained only for performance reasons. As you may recall from the discussion of normalization, duplicate copies of key data are often kept in separate tables to speed up access to the data. Shadow tables are where this duplicate data lives. Batch processing and shadow tables are extremely useful concepts that are used throughout the ITNet system.

Server-side data caching is another important concept that can really improve your server's performance. The benefits of such a scheme are immediately obvious: instead of having to wait for a database roundtrip to get some important bit of data, the server can retrieve it instantly from its own RAM. The one thing you need to be careful of, though, is that you don't start using the server side cache to make changes to the data that aren't represented somewhere in the database. The risk here is that the server could crash and the changed data would be lost forever. With a little care, though, this problem can be easily avoided. If you think about it, a server-side cache as just a server-local version of a shadow table, so as long as you treat this data as non-authoritative and disposable you won't run into any trouble.

About the Author

Pete Hallenberg

Blogger

See more from Pete Hallenberg

Related Topics

Related Topics

Recent in More

Related Topics

Of Internet Servers and SQL Databases: Designing the Backend for Power and Performance

Background: the ITNet Network

Do You Need a SQL Database?

Physical Infrastructure

Internet Server Design

Database Design

About the Author

Latest News

Opinion: How will Project 2025 impact game developers?

More from GD

Featured Blogs

Related Topics

Related Topics

Recent in More

Related Topics

<span class="ArticleBase-LargeTitle">Of Internet Servers and SQL Databases: Designing the Backend for Power and Performance</span>Of Internet Servers and SQL Databases: Designing the Backend for Power and Performance

Background: the ITNet Network

Do You Need a SQL Database?

Physical Infrastructure

Internet Server Design

Database Design

About the Author

Latest News

Opinion: How will Project 2025 impact game developers?

More from GD

Featured Blogs

Of Internet Servers and SQL Databases: Designing the Backend for Power and Performance