Elder Scrolls 5: Skyrim Needs Voice Commands

What if Bethesda took the chance to offer the player a new level of immersion? What if players could actually say activate the shouts by saying them aloud, using a microphone, rather than equipping and using them through a menu?

Maybe you’ve seen the Elder Scrolls 5: Skyrim trailer by now. Maybe you haven’t. If you haven’t taken the three minutes it takes, please go watch it now.The trailer inspired this post, so it’s best you see it first.

Okay, with that out of the way, I want to talk about game immersion and use that to make an argument for vocal interaction with video games. I want you to see the Skyrim trailer first, because I plan to use that game as a touchstone for this post.

Even if you’ve never played any of the Elder Scrolls games (or the new Fallout games, which are similar), if you’re into gaming at all you should know the basic idea – open world role-playing, single player, immersive as possible. Each new installment in the series brings a facelift, better AI, and improved mechanics, for the most part. But have we reached a place where those things alone aren’t enough for a truly new gaming experience?

Say what you what about graphics and AI upgrades, they’ve been improving long enough that we take these upgrades as par for the course. Better graphics help improve the experience, but each new level of graphical prowess isn’t enough anymore to improve the immersion on the same level as the first treks into three dimensional worlds and the first improvements over the 32-bit era of gaming. It’s been awhile since gaming took a new step into immersion. This is in despite 3D technologies, which are a good step, but these technologies are only graphical upgrades. Motion controls have been the only immersive experience upgrade in the last ten years, but they haven’t really been exploited in serious gaming experiences, not to mention that they are only available on consoles. The PC gamer is still sitting in a chair using a keyboard and mouse.

Now, since you’ve watched the Skyrim trailer by now, you’ll notice, right after the narration ends, the shout the character gives (around 1:10 in the video). Shouts are supposed to be an integral part of the game, from what I understand the character gains shouts as the player slays dragons. Shouts give the player upgrades and new powers. This isn’t anything groundbreaking from a game mechanics standpoint. It’s just a player upgrade path, so there’s nothing really new there.

But what if Bethesda took the chance to offer the player a new level of immersion if the player wanted it? What if players could actually say activate the shouts by saying them aloud, using a microphone, rather than equipping and using them through a menu as the mechanic is sure to actually function? Wouldn’t that raise the level of immersion to a deeper place?

Now, using vocals in a game isn’t a new concept by any means. Games have used vocal commands in the past. SOCOM on the Playstation 2 is the best example that comes to mind. The game shipped with a USB headset that the player could use to give commands to the NPC squad. SOCOM wasn’t the first game to use these commands, and it wasn’t the last, but voice control in general is not a common feature.

From my understanding, lack of good technology is the major hurdle. In SOCOM, players used commands to issue orders to their military squad mates.  Commands like “fireteam bravo move to crosshair” had to be said into the headset to get teammates to respond. Although this worked for some players, many players experienced two major negative outcomes: the speech recognition was terrible, causing teammates to just respond with “I don’t understand you.” Other players found that once in a hectic firefight, voice commands were too unnatural, formulaic, and convoluted to give quickly and efficiently.

Depending on the task, the margin for error for speech recognition software can be as low as 1% and as high as 50%, with some military application studies noticing a remarkable increase in errors in a stressed voice. In a game like SOCOM, a military simulator of sorts, the voice would certainly be stressed during firefights. This, combined with the formulaic nature of the commands, made issuing commands difficult. I don’t know about you, but I would get sick of saying “Fireteam Bravo move to crosshair” over and over on a good day, let alone every time I wanted to reposition a squad a few feet to the left.  

Although vocal commands seem like a natural fit for a squad shooter like SOCOM, the complexity of the commands certainly introduces errors, which ruins the point of the system. However, a fantasy game like Skyrim has more opportunities for simpler vocal commands, limiting error and increasing player immersion.

Games like Skyrim would most likely use speech recognition for several things and the magic system is the first obvious choice. Chanting a spell through a microphone could be a powerful immersion tool. Fantasy games in particular are uniquely suited to utilizing this effectively because the commands can be nonsense. There are certainly some syllable and consonant combinations that are more recognizable to speech recognition software than others.

While SOCOM had to use commands like “Fireteam Bravo, move to crosshair,” a fantasy magic system could use nonsense words that couldn’t be mistaken for anything else in the command list. I don’t know much about the technology, but certainly, if the command list was only 30 single-word spells, words could be picked with enough to differentiate between them that the software could have a low margin for error: “Light” “Icelance” “Fireball” “Open” – fun ways to immerse you in a game, limited chance for errors.

A magic system is obvious, but there are other places in a role-playing game where speech recognition would add important layers of immersion to a game. Animal mounts, for example. Being able to say “Woah!” and “Hyah!” to a horse to slow or increase speed would allow the game designers to add quirks to the gameplay that made riding different animals actually feel different. Some horses could be more spirited and need more talking to keep at the right speed.

Horses in games currently feel like vehicles. They may as well be motorcycles that only do what the player wishes them to do. The addition of voice control would allow a layer of abstraction between directly controlling the animal with the keyboard and having no control at all. This would make animals feel more like animals and less like extensions of the player. It could be a powerful technique and wouldn’t require speech recognition that was very powerful. Imagine having only your voice to slow your spooked horse as it sped toward a cliff. Yeah. You’d be immersed.

Another obvious aspect that could be improved by vocal commands would be simple additions to a conversation system. Certainly the ability to fully parse sentences is beyond the current technology level, but even the addition of “Yes” and “No” and other simple commands would be a welcome addition. If an NPC asked you to save her family, surely telling her “Yes” vocally would inspire more of a sense of immersion and give rise to more emotion than just clicking a choice in a menu. Beyond that, greetings could be important. Imagine being out in the world and seeing an NPC in the distance you’d like to talk to – you could yell a greeting to get the attention of the NPC, who could approach you as you approached him.

Next, the stealth system could be enhanced. The player could use voice noises to attract guards and peel them away from the group to take them out. Or the game could monitor noise levels when the player was trying to be stealthy. Loud noises could give away a player’s position and attract unwanted attention. The immersion would certainly be increased if players were forced to actually be quiet when trying to be stealthy in the game.

Certainly you can think of other ways simple voice commands could enhance a game. Feel free to leave them in the comments below. In the meantime, let’s move on past ideas for enhancements and talk a little about how the system could be implemented.

The thing about speech recognition software is that the larger and more complex the command list is, the closer the chance there will be like-sounding words, and the more likely errors would occur. However, games are unique compared to other speech recognition applications – players don’t need access to the entire command list all the time like they would in something like a speech-to-text application. The chance for error can be limited by keeping the command list small and dynamically scaling the command list based on the activity the player is engaged in during the game.

For instance, when a player is not mounted on an animal, the animal commands could be removed from the available list of commands because there is no reason to say “Hyah!” and “Woah!” unless a player is on an animal. The same could be said for conversation commands like “Yes” and “No.” They aren’t needed out in the field when the player isn’t involved in a conversation, so they are removed from the list. Similarly, if the game doesn’t allow mounted combat, the magic commands can be removed from the list of recognizable commands when the player is mounted.

This scaling keeps the list small, dynamically scaled to the activity the player is engaged in, and limits the possibility for error. This is important, since nothing breaks immersion more when the technological limits of a system break open and reveal their guts.

In conclusion, a system like this doesn’t have to have tons of commands or be overly complex to vastly increase immersion for the player, and certainly a system like this would do much more for increase immersion than the standard graphical facelift every game gets. I’m certain, that as awesome as Elder Scrolls 5: Skyrim is going to be, it would be much more immersive with some voice commands. When 11/11/11 rolls around and you’re playing the game: think about that.

I know I will be.


This post is reposted from my website,

Latest Jobs

IO Interactive

Hybrid (Malmö, Sweden)
Gameplay Director (Project Fantasy)

Arizona State University

Los Angeles, CA, USA
Assistant Professor of XR Technologies

IO Interactive

Hybrid (Copenhagen, Denmark)
Animation Tech Programmer

Purdue University

West Lafayette, IN, USA
Assistant Professor in Game Design and Development
More Jobs   


Explore the
Advertise with
Follow us

Game Developer Job Board

Game Developer


Explore the

Game Developer Job Board

Browse open positions across the game industry or recruit new talent for your studio

Advertise with

Game Developer

Engage game professionals and drive sales using an array of Game Developer media solutions to meet your objectives.

Learn More
Follow us


Follow us @gamedevdotcom to stay up-to-date with the latest news & insider information about events & more