This article is being highlighted as one of Gamasutra's top stories of 2013.
The PlayStation 4 is due out this fall, and its technical specifications have been largely under wraps -- till now. While the company gave a presentation at GDC, the system's lead architect, Mark Cerny, hasn't talked publicly in any great depth about the platform since its unveiling this February.
Cerny approached Gamasutra in the hope of delivering a "no holds barred PlayStation 4 hardware expose," he said, during the interview that resulted in this story. "That certainly is what we're here to do," said Cerny, before speaking to Gamasutra for well over an hour.
What follows is a total breakdown of the hardware from a developer's perspective: the chips on the board, and what they're capable of.
Questions on the UI and OS were off the table. What was up for discussion is what the system is capable of, and the thinking that lead Cerny and his team to make the decisions they made about the components they chose and how they function together.
To get to the heart of this deeply technical discussion, Gamasutra was assisted by someone with an intimate knowledge of how console hardware really works: Mark DeLoura, THQ's former VP of tech and now senior adviser for digital media at the White House Office of Science and Technology Policy.
"For me, this all started in late 2007," said Cerny, remembering how he embarked on the road to becoming lead architect of the PlayStation 4. "Because we'd been doing postmortems on the PlayStation 3 -- a very broad group of people across the Sony Computer Entertainment team were evaluating how well that had gone."
That lead, naturally, to thoughts about what to do next. Musing on the architecture of Sony's next system, Cerny spent his Thanksgiving holiday reading up on the history of the X86 architecture -- realizing that not only had it evolved dramatically over the years, but that by the time the PlayStation 4 shipped, it would be powerful enough for Sony's needs.
It had evolved into something "that looked broadly usable by even the sort of extreme programmers we find in the games business," he said.
Realizing how passionate he was about the PlayStation 4 project, after Thanksgiving, Cerny went to Sony's then-execs Phil Harrison and Masa Chatani, "and asked if I could lead the next generation effort. And to my great surprise, they said yes."
"The Biggest Thing" About the PlayStation 4
Cerny approached the design of the PlayStation 4 with one important mandate above all else: "The biggest thing is we didn't want the hardware to be a puzzle that programmers would be needing to solve in order to make quality titles."
The PlayStation 3 was very powerful, but its unfamiliar CELL processor stymied developers. "There was huge performance there, but in order to unlock that performance, you really needed to study it and learn unique ways of using the hardware," said Cerny.
That situation led directly to the PS4's design philosophy: "The hope with PlayStation 4 was to have a powerful architecture, but also an architecture that would be a very familiar architecture in many ways."
In fact, this is something Cerny returned to again and again during the conversation. "We want to make sure that the hardware is easy to use. And so having the familiar CPU and the familiar GPU definitely makes it easier to use," he said.
Later, when asked about whether Sony considers the fact that many third party developers will also have to create versions of their games for the next Xbox, his response was, "when I say that our goal is not to create puzzles that the developers have to solve, that is how we do well in a multi-platform world."
But ease-of-use is far from Cerny's only goal. As a 31-year veteran of the industry, he well knows that the PC will march onward even as the PlayStation 4 stays frozen in time.
"Ultimately, we are trying to strike a balance between features which you can use day one, and features which will allow the system to evolve over the years, as gaming itself evolves," said Cerny. The "supercharged PC architecture," that the team has come up with -- to use Cerny's term -- is designed to offer significant gains the PC can't, while still offering a familiar technological environment for engineers.
To design the PlayStation 4, Cerny didn't just rely on research, or postmortems of the PlayStation 3. He also toured development teams and spoke to middleware partners to find out precisely what they wanted to see in a next generation console. The result? You'll read about it below.
What Does 'Supercharged' Mean, Anyway?
The PlayStation 4's architecture looks very familiar, at first blush -- and it is. But Cerny maintains that his team's work on it extends it far beyond its basic capabilities.
For example, this is his take on its GPU: "It's ATI Radeon. Getting into specific numbers probably doesn't help clarify the situation much, except we took their most current technology, and performed a large number of modifications to it."
To understand the PS4, you have to take what you know about Cerny's vision for it (easy to use, but powerful in the long term) and marry that to what the company has chosen for its architecture (familiar, but cleverly modified.) That's what he means by "supercharged."
"The 'supercharged' part, a lot of that comes from the use of the single unified pool of high-speed memory," said Cerny. The PS4 packs 8GB of GDDR5 RAM that's easily and fully addressable by both the CPU and GPU.
If you look at a PC, said Cerny, "if it had 8 gigabytes of memory on it, the CPU or GPU could only share about 1 percent of that memory on any given frame. That's simply a limit imposed by the speed of the PCIe. So, yes, there is substantial benefit to having a unified architecture on PS4, and it’s a very straightforward benefit that you get even on your first day of coding with the system. The growth in the system in later years will come more from having the enhanced PC GPU. And I guess that conversation gets into everything we did to enhance it."
The CPU and GPU are on a "very large single custom chip" created by AMD for Sony. "The eight Jaguar cores, the GPU and a large number of other units are all on the same die," said Cerny. The memory is not on the chip, however. Via a 256-bit bus, it communicates with the shared pool of ram at 176 GB per second.
"One thing we could have done is drop it down to 128-bit bus, which would drop the bandwidth to 88 gigabytes per second, and then have eDRAM on chip to bring the performance back up again," said Cerny. While that solution initially looked appealing to the team due to its ease of manufacturability, it was abandoned thanks to the complexity it would add for developers. "We did not want to create some kind of puzzle that the development community would have to solve in order to create their games. And so we stayed true to the philosophy of unified memory."
In fact, said Cerny, when he toured development studios asking what they wanted from the PlayStation 4, the "largest piece of feedback that we got is they wanted unified memory."
"I think you can appreciate how large our commitment to having a developer friendly architecture is in light of the fact that we could have made hardware with as much as a terabyte [Editor's note: 1000 gigabytes] of bandwidth to a small internal RAM, and still did not adopt that strategy," said Cerny. "I think that really shows our thinking the most clearly of anything."
Familiar Architecture, Future-Proofed
So what does Cerny really think the console will gain from this design approach? Longevity.
Cerny is convinced that in the coming years, developers will want to use the GPU for more than pushing graphics -- and believes he has determined a flexible and powerful solution to giving that to them. "The vision is using the GPU for graphics and compute simultaneously," he said. "Our belief is that by the middle of the PlayStation 4 console lifetime, asynchronous compute is a very large and important part of games technology."
Cerny envisions "a dozen programs running simultaneously on that GPU" -- using it to "perform physics computations, to perform collision calculations, to do ray tracing for audio."
But that vision created a major challenge: "Once we have this vision of asynchronous compute in the middle of the console lifecycle, the question then becomes, 'How do we create hardware to support it?'"
One barrier to this in a traditional PC hardware environment, he said, is communication between the CPU, GPU, and RAM. The PS4 architecture is designed to address that problem.
"A typical PC GPU has two buses," said Cerny. "There’s a bus the GPU uses to access VRAM, and there is a second bus that goes over the PCI Express that the GPU uses to access system memory. But whichever bus is used, the internal caches of the GPU become a significant barrier to CPU/GPU communication -- any time the GPU wants to read information the CPU wrote, or the GPU wants to write information so that the CPU can see it, time-consuming flushes of the GPU internal caches are required."
Enabling the Vision: How Sony Modified the Hardware
The three "major modifications" Sony did to the architecture to support this vision are as follows, in Cerny's words:
- "First, we added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. As a result, if the data that's being passed back and forth between CPU and GPU is small, you don't have issues with synchronization between them anymore. And by small, I just mean small in next-gen terms. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!
- "Next, to support the case where you want to use the GPU L2 cache simultaneously for both graphics processing and asynchronous compute, we have added a bit in the tags of the cache lines, we call it the 'volatile' bit. You can then selectively mark all accesses by compute as 'volatile,' and when it's time for compute to read from system memory, it can invalidate, selectively, the lines it uses in the L2. When it comes time to write back the results, it can write back selectively the lines that it uses. This innovation allows compute to use the GPU L2 cache and perform the required operations without significantly impacting the graphics operations going on at the same time -- in other words, it radically reduces the overhead of running compute and graphics together on the GPU."
- Thirdly, said Cerny, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."
"The reason so many sources of compute work are needed is that it isn’t just game systems that will be using compute -- middleware will have a need for compute as well. And the middleware requests for work on the GPU will need to be properly blended with game requests, and then finally properly prioritized relative to the graphics on a moment-by-moment basis."
This concept grew out of the software Sony created, called SPURS, to help programmers juggle tasks on the CELL's SPUs -- but on the PS4, it's being accomplished in hardware.
The team, to put it mildly, had to think ahead. "The time frame when we were designing these features was 2009, 2010. And the timeframe in which people will use these features fully is 2015? 2017?" said Cerny.
"Our overall approach was to put in a very large number of controls about how to mix compute and graphics, and let the development community figure out which ones they want to use when they get around to the point where they're doing a lot of asynchronous compute."
Cerny expects developers to run middleware -- such as physics, for example -- on the GPU. Using the system he describes above, you can run at peak efficiency, he said.
"If you look at the portion of the GPU available to compute throughout the frame, it varies dramatically from instant to instant. For example, something like opaque shadow map rendering doesn't even use a pixel shader, it’s entirely done by vertex shaders and the rasterization hardware -- so graphics aren't using most of the 1.8 teraflops of ALU available in the CUs. Times like that during the game frame are an opportunity to say, 'Okay, all that compute you wanted to do, turn it up to 11 now.'"
Sounds great -- but how do you handle doing that? "There are some very simple controls where on the graphics side, from the graphics command buffer, you can crank up or down the compute," Cerny said. "The question becomes, looking at each phase of rendering and the load it places on the various GPU units, what amount and style of compute can be run efficiently during that phase?"
Launch and Beyond
The benefits of this powerful hardware will be seen in the PlayStation 4's launch games. But Cerny maintains that, in the future, they'll shine through in totally different ways.
"The launch lineup for PlayStation 4 -- though I unfortunately can’t give the title count -- is going to be stronger than any prior PlayStation hardware. And that's a result of that familiarity," Cerny said. But "if your timeframe is 2015, by another way of thinking, you really need to be doing that customization, because your competition will be doing that customization."
So while it takes "weeks, not months" to port a game engine from the PC to the PlayStation 4 according to Cerny, down the road, dedicated console developers can grasp the capabilities of the PlayStation 4, customize their technology, and really reap the benefits.
"There are many, many ways to control how the resources within the GPU are allocated between graphics and compute. Of course, what you can do, and what most launch titles will do, is allocate all of the resources to graphics. And that’s perfectly fine, that's great. It's just that the vision is that by the middle of the console lifecycle, that there's a bit more going on with compute."
Freeing Up Resources: The PS4's Dedicated Units
Another thing the PlayStation 4 team did to increase the flexibility of the console is to put many of its basic functions on dedicated units on the board -- that way, you don't have to allocate resources to handling these things.
"The reason we use dedicated units is it means the overhead as far as games are concerned is very low," said Cerny. "It also establishes a baseline that we can use in our user experience."
"For example, by having the hardware dedicated unit for audio, that means we can support audio chat without the games needing to dedicate any significant resources to them. The same thing for compression and decompression of video." The audio unit also handles decompression of "a very large number" of MP3 streams for in-game audio, Cerny added.
At the New York City unveiling of the system, Cerny talked about PlayGo, the system by which the console will download digital titles even as they're being played.
"The concept is you download just a portion of the overall data and start your play session, and you continue your play session as the rest downloads in the background," he explained to Gamasutra.
However, PlayGo "is two separate linked systems," Cerny said. The other is to do with the Blu-ray drive -- to help with the fact that it is, essentially, a bit slow for next-gen games.
"So, what we do as the game accesses the Blu-ray disc, is we take any data that was accessed and we put it on the hard drive. And if then if there is idle time, we go ahead and copy the remaining data to the hard drive. And what that means is after an hour or two, the game is on the hard drive, and you have access, you have dramatically quicker loading... And you have the ability to do some truly high-speed streaming."
To further help the Blu-ray along, the system also has a unit to support zlib decompression -- so developers can confidently compress all of their game data and know the system will decode it on the fly. "As a minimum, our vision is that our games are zlib compressed on media," said Cerny.
There's also another custom chip to put the system in a low-power mode for background downloads. "To make it a more green hardware, which is very important for us, we have the ability to turn off the main power in the system and just have power to that secondary custom chip, system memory, and I/O -- hard drive, Ethernet. So that allows background downloads to happen in a very low power scenario. We also have the ability to shut off everything except power to the RAMs, which is how we leave your game session suspended."
Sounds Good, But... Bottlenecks?
One thing Cerny was not at all shy about discussing are the system's bottlenecks -- because, in his view, he and his engineers have done a great job of devising ways to work around them.
"With graphics, the first bottleneck you’re likely to run into is memory bandwidth. Given that 10 or more textures per object will be standard in this generation, it’s very easy to run into that bottleneck," he said. "Quite a few phases of rendering become memory bound, and beyond shifting to lower bit-per-texel textures, there’s not a whole lot you can do. Our strategy has been simply to make sure that we were using GDDR5 for the system memory and therefore have a lot of bandwidth."
That's one down. "If you're not bottlenecked by memory, it's very possible -- if you have dense meshes in your objects -- to be bottlenecked on vertices. And you can try to ask your artists to use larger triangles, but as a practical matter, it's difficult to achieve that. It's quite common to be displaying graphics where much of what you see on the screen is triangles that are just a single pixel in size. In which case, yes, vertex bottlenecks can be large."
"There are a broad variety of techniques we've come up with to reduce the vertex bottlenecks, in some cases they are enhancements to the hardware," said Cerny. "The most interesting of those is that you can use compute as a frontend for your graphics."
This technique, he said, is "a mix of hardware, firmware inside of the GPU, and compiler technology. What happens is you take your vertex shader, and you compile it twice, once as a compute shader, once as a vertex shader. The compute shader does a triangle sieve -- it just does the position computations from the original vertex shader and sees if the triangle is backfaced, or the like. And it's generating, on the fly, a reduced set of triangles for the vertex shader to use. This compute shader and the vertex shader are very, very tightly linked inside of the hardware."
It's also not a hard solution to implement, Cerny suggested. "From a graphics programmer perspective, using this technique means setting some compiler flags and using a different mode of the graphics API. So this is the kind of thing where you can try it in an afternoon and see if it happens to bump up your performance."
These processes are "so tightly linked," said Cerny, that all that's required is "just a ring buffer for indices... it's the Goldilocks size. It's small enough to fit the cache, it's large enough that it won't stall out based on discrepancies between the speed of processing of the compute shaders and the vertex shaders."
He has also promised Gamasutra that the company is working on a version of its performance analysis tool, Razor, optimized for the PlayStation 4, as well as example code to be distributed to developers. Cerny would also like to distribute real-world code: "If somebody has written something interesting and is willing to post the source for it, to make it available to the other PlayStation developers, then that has the highest value."
A Knack for Development
There's another way Cerny is working to understand what developers need from the hardware.
"When I pitched Sony originally on the idea that I would be lead system architect in late 2007, I had the idea that I'd be mostly doing hardware but still keep doing a bit of software at the time," he said. "And then I got busy with the hardware."
That detachment did not last. "I ended up having a conversation with Akira Sato, who was the chairman of Sony Computer Entertainment for many years. And his strong advice was, 'Don't give up the software, because your value is so much higher to the process, whatever it is -- whether it's hardware design, the development environment, or the tool chain -- as long as you're making a game.'"
That's the birth of Knack, Cerny's PlayStation 4 game, which he unveiled during the system reveal in New York City. And it's his link to understanding the practical problems of developing for the PlayStation 4 in an intimate way.
From a Thanksgiving weekend reading technical documents through a difficult and complex engineering process and finally to the development of a big new IP launch for Sony -- you can't say Mark Cerny isn't a dedicated, passionate, and busy man.
"I have not been this busy in 20 years. It's nice. But, definitely, I'm very busy right now," he said, to laughter from everyone in the room.