[In this Intel-sponsored feature, part of the Visual Computing section, the technical experts behind Mythic and EA's Warhammer Online discuss the mechanics of keeping the MMO running across multiple servers and data centers.]
Late at night in unattended server rooms around the world, noiseless except for the soft whir of cooling fans, the peculiar entities that inhabit Warhammer Online: Age of Reckoning (WAR) are in ceaseless motion. Guided by artificial intelligence, these computer-generated beings continue to roam across the elaborately detailed landscapes and through underground passageways, even at times when few players are active in the game.
Battles erupt when characters randomly encounter each other. Footprints are etched into damp grass, branches snapped, rocks dislodged from walls, and the landscape is altered in innumerable ways-all as a part of the never-ending background activity.
Persistence is a key element of the imaginative Warhammer Online world. Its creators at Mythic Entertainment see that persistence as a differentiator amidst a slate of competitors in the increasingly popular genre of MMORPG(massively multiplayer online role-playing games).
The Warhammer Online universe is populated by an extensive selection of characters, each belonging to an individual class that occupies a unique role in the game (such as Engineer or Shaman). Competition can take the form of Player vs. Player (PvP) or Player vs. Environment (PvE) activities.
Support for public quests is a new addition to this venerable game's legacy, which dates back decades to the Games Workshop classic Warhammer fantasy game series created over 25 years ago. Free to engage in impromptu Warhammer Online public quests, players gain influence based on their degree of participation. Some of the most spirited play takes place in Realm vs. Realm competition, where organized teams of characters carry out quests as a group against opponents.
Who Are You?
Gamers choose from a broad range of alter egos, selecting 3D bodies to inhabit while traveling across the WAR territories. To get started, each new player selects a server in a particular region of the world. Servers exist in five regions: North America, Oceania, Europe, Russia, and (as of May 28, 2009) Taiwan.
The creatures, daemons, humanoids, and monsters likely to be encountered during a WAR quest are extensive and varied-not unlike the range of beings introduced in any good science fiction or fantasy tale.
The Undead stalk the landscape, with bone giants, living armor, and the winged nightmares among their numbers. Monsters such as the flayerkin, dragon ogre, and troll make life difficult for players. Humanoids fit into distinct categories, including beastmen, dwarves, elves, greenskins, humans, ogres, and skaven. Many of these WAR world inhabitants are animated by artificial intelligence and the underlying game logic. Others represent the avatars of players engaged in the competition.
Role-playing adventure games on the computer started out as text-based interactions, such as Zork, a popular game from Infocom in the early 1980s. Now, the scope, intricacy, and immersion is akin to being embedded within an ever-changing motion picture, following a storyline that is shaped by thousands of other participants, requiring massive processing power to control and coordinate astronomical numbers of interactions.
In a recent interview with Online Technical Director Andrew Mann at Mythic Entertainment and Chief Technical Officer Matt Shaw, we talked about the challenge of coordinating hundreds of servers spread across the world, the importance of normalizing the hardware platform on a common set of specifications, and the power and performance demands of processing millions of events across a 45-square mile world populated with millions of objects and thousands of players.
Pay No Attention to the Man Behind the Curtain
In the midst of the worldwide gaming enterprise, Andrew Mann bears the responsibility of overseeing the server infrastructure that keeps the world alive. He describes his role modestly: "I guess you could say the areas that I oversee are fairly broad. But, the way I like to look at it is from the bottom up. The software engineers work from the code side, and they essentially try to look at things at a high level and work downward."
"I work up from the hardware level. So I work with our system administration team, I work with our IT team, our networking team here, and then I work up into the code level at that point, as well. Along with all of the operating system details, all of our database software, and everything like that. That's basically my purview."
After pausing thoughtfully for a beat, Mann continued, "My day-to-day work is actually extremely varied. Some days it'll be just a lot of discussion with team members. I usually touch all of those divisions on a day-to-day basis. Some days, however, are heavier in the system administration side. Sometimes we've got to do a lot of work on hardware. Sometimes we've got to swap equipment around.
Sometimes we have a lot more work on code bugs that are coming up. Maybe we've got a big crash issue or we've got a big feature push. Sometimes it's networking-we've got new equipment coming in and we've got to expand the network- those kinds of things."
At any given time, approximately 2,000 servers are in operation, supporting the gameplay in WAR. Matt Shaw commented, "What we call a server to the user-that main server is actually a cluster of a number of machines."
"Our Server Farm in Virginia, for example," Mann said, "has about 60 Dell Blade chassis running Warhammer Online-each hosting up to 16 servers. All in all, we have about 700 servers in operation at this location."
"The servers that are in the UK are not managed by us," Mann continued. "They're IBMBlades managed by our partner, GOA, and they're organized a little bit differently."
Servers on the Back End of Warhammer Online
Running the vast server farms underlying the Warhammer Online world cost-effectively requires using each server's resources efficiently. The Mythic technical team relies on the blade server architectures and application design to maximize server resource usage.
"We use blade architecture heavily for Warhammer Online," Mann noted. "Almost every server that we deploy is a blade system. We don't use virtualization; our software is somewhat virtualized itself. We've always had the technology to run our game world across several pieces of hardware. It's application-layer clustering at a process level. Virtualization wouldn't gain us much because we already run very close to peak CPU usage on these systems."
"We're watching developments in the virtualization area with interest," Shaw said, "as we plan for achieving maximum server efficiency with minimal power use in the future." Currently, power budgeting is handled automatically by the Dell powering options within the blade server cabinets. Mythic doesn't attempt to control any of the power usage at the application level.
The normalized server configuration-in use across all of the Mythic-managed facilities-features dual Quad-Core Intel Xeon processors running at 3 GHz with 8 GB of RAM.
"The Intel CPUs we're using," Mann said, "are all the same model CPUs that we normalized on about a year and a half, two years ago when we first looked at our production hardware specs. We're in pretty tight control of what our hardware specs are when we roll out the game. As you can imagine, with anything this complex, if we roll out to a partner that's managing their own hardware and it ends up being slightly less powerful, or it ends up not having quite as much RAM, or it has some difference on paper that doesn't look like too big of a deal, it can cause pretty catastrophic results to the entire game system. And when you've got 700 systems deployed it can be a big deal to go back and update them all."
Energy-efficient performance is a key consideration when running hundreds of servers, an area where these second-generation quad-core technology machines excel, providing an effective solution for coping with cooling and density challenges. The Intel Xeon processor-based servers also deliver an exceptional degree of stability and proven reliability, important when you have thousands of dedicated gamers who expect the uninterrupted action to continue unabated around the clock.
A Slayer faces off against a Liche Priest and two giant scarabs
A giant bone construct crackles with energy as a Liche Priest channels life back into its body
As might be expected in a game scenario in which the levels of participation vary dramatically at different times of the day, in different regions, and with different types of activity, performance scaling is an essential component of successful server operation.
"One of our ongoing challenges," Mann commented, "is where to distribute people in the world. Our processes-that we distribute across the physical hardware-correspond to locations in the virtual world. One of the focuses of our game, the big focus, is to get a lot of people in one place and have them all fighting with each other. And that, unfortunately, works against us in the process distribution model."
"When you put a lot of people in one place, you're putting their entire server load onto one piece of hardware. We do have some technology to mitigate that. Our scenario system (which spawns up smaller arenas for smaller teams dynamically) allows us to split people off to different pieces of hardware if we need to, dynamically, in smaller chunks."
Using this approach, the application, instead of coping with 800 people in an area on one system, can take 400 of those people out of an area and engage them in smaller fights. Most of the parallelism for these kinds of operations, Shaw noted, is done by process, not by thread.
Taking It to the Extreme
On the client side, the processor-intensive activities likely to be generated by excursions into the Warhammer Online environment can be smoothed out substantially with the help of the platform capabilities of the highest performing desktop processor on the planet.1 Currently creating a buzz across the gaming world, the Intel Core i7 processor Extreme Edition features intelligent multi-core technology that accelerates performance in response to increasing workloads.
New features enhance the overall gaming experience, such as Intel Turbo Boost Technology (to maximize speed for demanding applications), Intel Hyper-Threading Technology (for advanced multi-tasking and support for up to eight threads), and Intel Smart Cache (to provide a higher performing, more efficient cache subsystem). Experience Warhammer Online in its best light with the processor that has become the gold standard in the gaming world, the Intel Core i7 processor Extreme Edition.
1 Performance based on select industry benchmarks, game titles, and multimedia creation applications.
The Challenge of Maintenance
A seamless game experience requires servers that don't crash, servers that aren't offline for long periods for patches and updates, and servers that can be managed easily and even remotely, if need be. Mythic employs a number of techniques to deal with the thorny challenge of maintenance.
"Among the challenges," Shaw said, "we have to distribute on a regular or emergency fix basis new data and executables to our users who run the Warhammer client. We distribute components all over the world, including to our partners, who then stage it out to their users."
"Then," Shaw continued, "we have to take care of the servers; whenever we do server updates, they do have to be managed. There are several different ways to manage this many servers. You can bring them all down for a maintenance operation once a week; we only bring down our servers as needed for major updates. Often we can make dynamic changes while the servers are running."
"We built a system to automatically deploy both major and dynamic updates reliably to servers as needed. We had to create this system for Warhammer. In our earlier projects, like Dark Age of Camelot, there weren't that many servers. Now there are so many, there's no way you could apply an update to every server by hand."
"We could design the systems to be 24/7 and attempt to patch them," Mann said, "but it would introduce a lot more complexity. For instance, a design that allowed us to patch the system in pieces would require us to coordinate shifting load away from each piece before we updated that piece. A single mistake during that process, and the entire system could come crashing down, and it would require a lot of unexpected, time-consuming work to put things back together."
"Instead," Mann said, "our approach is to schedule downtime for updates in advance-which is a little trickier than it seems. Our worldwide deployment means we have to schedule downtime in phases. In North America, we'll bring it down during our low time, which is usually early morning mid-week. And, then at that same time, our Australian players are 12 or 14 hours ahead of us-depending on what time of year it is and where they are on the continent and such."
"If you think about that, if we bring it down about 6:00 or 7:00 a.m. in the U.S. then that's 7:00 or 8:00 p.m. in Australia-prime time for players enjoying an evening of gaming. So, we wait several hours after we update the U.S. servers to pass the Australian prime time, and then we take the Australian servers down and roll the update out to those."
As the sun sets on the Necropolis of Zhandri, a Bone Giant continues to patrol the ruined kingdom
"One of the great things that's come along in the last five or six years is the remote access capabilities-it's one of the big things we like about the blade architecture," Mann said. "It gives us the ability to connect into the blades and manage the full set of hardware capabilities from anywhere in the world. Being able to power down blades, being able to get a virtual console, mount CD-ROMS and DVD-ROMS from remote, any type of media there-all of this has definitely assisted us with executing our worldwide roll-outs smoothly."
"With the hardware that we rolled out in 1999 with Dark Age of Camelot," Mann reflected, "we did not have the capability to effectively manage systems that were on another continent without having someone local being there. And, to touch on that, our data centers in Australia and Germany don't have a 24-hour staff. We don't even have any Mythic staff on the same continent."
"If we have a situation that requires a physical hardware swap, we'll arrange for the hosting facility to do it for us (such as replacing a failed hard drive), but with the increasing ability to remotely and pre-emptively detect hardware failures, physical maintenance is no longer a part of the time-critical emergency response. We've developed a lot of techniques to take care of most emergency situations from remote locations-24 hours a day anywhere in the world."
For server-side operations, networking challenges represent a large part of the performance considerations, and it is also in this area that Mythic finds abundant opportunities for multi-threaded operations. On the server, the implementations tend to be processes with extensive network communication between them. WAR uses TCP connections effectively as queues between the processes, providing a degree of asynchronous separation between the individual processes.
"As we're moving more heavily towards the use of threads," Mann said, "we're following the same basic model. Instead of making very small processes that use TCP communication to talk to each other, we turn the processes into threads that use queues to communicate. This eliminates the overhead of network communication while keeping our game systems discrete."
The Clarity of Darkness
No matter where you are in the world reading this, somewhere in another quarter of the planet the inhabitants of Warhammer Online are tromping through vegetation, casting spells, charging a hill with a group of team members, poking through underground dungeons, or rallying together to confront an opposing Realm.
When the players rest and night unfurls across the globe, the undead continue to prowl the expanses of the WAR fantasy world, monsters continue their quest for humanoid flesh, and the intricately realized world vibrates from the dance of billions of electrons, quivering in anticipation of the next humanoid to enter the fold. Step cautiously gamers. Danger lurks in every direction.
When the action gets heated, the servers keep their cool, thanks to the energy-efficient performance of an infrastructure based on Intel Xeon processors, providing the kind of rock-solid stability that effortlessly supports 45 square miles of virtual real estate teeming with artificial life and epic gameplay on a grand scale. WAR exists in a realm that is part imagination, part silicon-a world where fantastical beings roam freely, fantasy plotlines twist and take unexpected turns daily, and the entities never sleep.
The International Reach of Warhammer Online
The Warhammer Online franchise encompasses countries around the world with participants engaging in battles in five distinct regions. There are North American servers, Oceania servers, European servers, Taiwan servers, and Russian servers-each supporting different variations of gameplay, as well as a number of test servers. Localization is an important element when dealing with a variety of regions and cultures. Warhammer Online has been localized to support 12 languages.
On local differences, Andrew Mann said, "There is a distinct character from different regions of the world. It's a little hard to say, from a technical perspective, what regions have each character. And, of course, there are different people within each region. It tends to be that each region has kind of a majority persona. There are all these people that are around throughout the entire spectrum, but, for example, the Russian player base plays during different peak hours. They play longer during the day and there's a lot higher percentage of them online than we saw during our North American launch."
"Now," Mann continued, "whether that's because there are different levels of competition over there, whether it's because we did an especially good job of localizing the game, whether it's because they just like the kind of combat system in our game, or the kind of play style in our game versus other games, it's kind of premature to actually tell. It's difficult to tell that just from the raw data that we see on the technical side."
While it's difficult to cull data on gamer activities without becoming intrusive, Mythic uses anecdotal information and large-scale data correlation to scale server operations to the gameplay patterns for each region and to ensure that servers have sufficient headroom to operate effectively during peak periods of activity. There's both an art and a science to keeping actions in all parts of the world load balanced, and the technical staff at Mythic confronts this challenge with all the tools at their disposal.