My most recent project, Slashcards, is a game about learning language -- and it's a local co-op action game played on gamepads. I took it as a given that real language learning means free text input. So, a central design challenge for Slashcards was to make a user-friendly and game-friendly gamepad onscreen keyboard.
Part 1: A survey of extant gamepad typing interfaces.
In an effort to avoid re-inventing the wheel -- and falling into old ruts -- I did some research on to typing on consoles. Perhaps the most naive approach of all was to make a matrix of buttons where each button corresponded to a key on the keyboard. The user would then navigate through the matrix like any kind of on-screen GUI, going from GUI element to GUI element by pressing the d-pad or left stick in the corresponding direction. This has remained ubiquitous ever since 8-bit consoles. Here's The Legend of Zelda (NES, 1986):
...and here's the Xbox 360, 20 years layer: (around its launch, ~2006)
The Xbox 360 keyboard originally defaulted to an ABC layout. I guess ABC-order is defensible insofar as we all know the alphabet, so, given a letter, we would intuit the neighboring left and right letters. But what about the letters above and below? It's frustrating that, for example, P is to the left of L. Moreover using the wrap-around topology of the keyboard -- pressing left on the left edge to go to the right edge, and vice versa -- would be a convenience only available to those who carefully studied this unnatural layout.
Eventually -- perhaps from release, I don't recall -- you could change the layout of the Xbox 360 on-screen keyboard to a QWERTY layout, and leverage the familiarity every modern person has from thousands of hours of typing. Now you could hunt-and-peck like you were typing on your laptop as you would while you ate cereal or nudged some stir-frying onions with the other hand. This is the default layout these days for the vast majority of on-screen keyboards.
Sony has offered a bizarre multi-step interface, whereby users hunt-and-peck a letter family on the left. Then a number of options including auto-complete suggestions are presented on the right. Finally a user selects one of those options by again navigating across them. The result is a mixed bag. Letters that were far from each other in the above keyboards are closer in terms of button press-count, but previously-adjacent letters might be five or six button-presses in the PSP system. Maybe this compromise plus the autocomplete suggestions turns out to give more words per minute -- I remember my time typing on the PSP as an exercise in frustration where I had to relearn the system every time I encountered it. Either way, the bottom line is that any design that relies on autocomplete suggestions is not going to be appropriate for Slashcards.
Fortunately for PSP fans (and the PSP's successor, the Vita) eventually the PSP also offerred a full keyboard layout (known as the "fullscreen keyboard" on the PSP.
A far better performing option is the approach taken by the Wii and the PS4 on-screen keyboards. They offer a QWERTY keyboard whereby the player can more or less point to a key to select it. (The PS4 requires players to tilt, not exactly point, but the result feels responsive and intuitive.)
Steam Big Picture mode has an interesting hierarchical keyboard. The user presses a direction with the left stick and selects one of the four action buttons to select a character.
The more you use this keyboard, the faster you'll type -- and the skill ceiling is far higher than the hunt-and-peck keyboards above. It also has the virtue of being alphabetical but without the compromise of arbitrary rows that computer-keyboard-lookalike layouts have. At every letter you can easily see if your next layer is 1) in the current button group, 2) counter-clockwise (previous to the current letter) or 3) clockwise (after the current letter.)
This is the first system that really tempted me towards implementation in Slashcards. But it has two major disadvantages. One is that the direction-button combination for each key is totally new -- no QWERTY knowledge to leverage. And the second is the sheer size of the required graphic. I struggled mightily to compress it to a size that wouldn't be so demanding of screen real estate. I couldn't come up with a workable solution but you'll see that it has something in common with the Japanese input solution I devised.
My first thought was to take the extant QWERTY hunt-and-peck keyboard that we see on the Xbox/PS and add another cursor. Players could hunt-and-peck with the left stick and right stick. This seemed obvious enough that I wondered if this had been attempted before -- sure enough, a Google search hit showed me that the approach is explored to some extent in a Microsoft research paper.
Their research finds study participants went from 5.8 words per minute on the single-stick hunt-and-peck QWERTY keyboard to 6.4 words per minute with a dual-stick split hunt-and-peck QWERTY keyboard. A 10% gain is still far from the kind of quantum leap I'm looking for.
Part 2: Iterating towards Slashcards' SQuEDSOS keyboard.
After researching extant gamepad text input methods, I found that none of these options would work to type in even a few letters in a real-time action game situation. (Moreover, I'm not counting on accelerometers or IR sensors for any Wii or PS4 fanciness.) If Slashcards is to be workable as an action game, I need a better solution. I was sure I'd at least need a dual-stick input method, so I began with the design described in the Microsoft paper.
Iteration 0: +Dual-stick, hunt-and-peck, divided keyboard
Dual stick input seemed a natural extension from single hunt-and-peck keyboard input, and I was glad that someone had given it a shot over the past thirty years. But I was disappointed to see that their results showed such a modest increase in input speed.
This design consists of dividing the keyboard in two and putting the cursors where your respective middle fingers would rest on the keyboard -- the left cursor defaulting to "d" and the right to "k". To type "q" would mean pressing left, left, up on the left stick and then a left shoulder button. "m" would be down and then left on the right stick, and then pressing a right shoulder button.
Like the single-cursor, hunt-and-peck keyboard, cursor positions are persistent. So once the left cursor is on “q”, it takes seven inputs to type “b” (left-stick-right, right, down, right, right, down, left-shoulder.)
Iteration 1: +Elastic cursor
I played with having the cursor positions reset after an input. On the one hand, this behavior is less obvious than leaving the cursor where you left it. But in exchange for a slightly steeper learning curve, this choice provides the user with an opportunity to develop muscle memory for every letter. "w" is then left-stick-up, left-stick-left, left-shoulder every time, regardless of the previous letter. After a few minutes of use, I could feel myself getting faster. But typing whole words still felt like a tedious amount of input. If every letter is two to five inputs, a five letter word ends up being around sixteen inputs.
When I tried mocking it up myself, I found that my own performance was already a bit better than that of Microsoft Research's test group. But the hunt-and-peck approach with its concomitant repeated directional inputs -- even with the keyboard divided -- won't reliably work in any kind of time-constrained, action game context.
Iteration 2: +Free selection (non-hunt-and-peck)
Wouldn’t it be nice if I could choose a letter with a single stick-movement? Dispensing with the stateful cursor position, I yoked cursor position directly to the stick position.
Under this system, on the left stick, leaving the stick idle would select “d”, pressing all the way to the left would select “a”, pressing to the upper-right would select “t”. The cardinal directions and their diagonals are easy enough…
...but that leaves the intermediate letters to be related to somewhat tricky intermediate positions. As gamers we've been conditioned to press the gamepad sticks to the edge almost 100% of the time. Even though it looks like S and F have a generous piece of the map, in practice, holding the joystick in those zones is incredibly finicky.
Iteration 3: +Optimized free selection
The difficulty here can largely be addressed by optimizing the mapping between stick and virtual keyboard. The guiding principle is to give each option as much perceived space as possible on the stick. Even though S and F above have their fair share of the map, their respective zones feel like an incredibly small islands. So if we compare the controller mapping to the perceived mapping, we can see that the original, naive mapping is obviously, needlessly difficult:
And while it’s tempting to contrive a mapping where each option literally has equal area on this mapping, like above, it ignores obvious and massive optimizations the controller gives in context. For one, the neutral position needs zero area -- the stick reliably snaps back to (0.00, 0.00).
The cardinal directions are also essentially a zero-thickness line, wherein each has a coordinate of 1.00 (north, or “e” on the left stick, always has y = 1.00.)
In practice, I found that the other coordinate of the cardinal directions was always very close to 0.00. And indeed, all the keys around the edge of the selection area (QWERT-G-BVCXZ-A, to go around clockwise) had a distance from the center that was reliably greater than 0.9.
Testing and experimentation gave me the final values I used to divide up the mapping, and the end result looks something like this:
The result is a massively improved input rate--provided the user is willing to put up with an initial learning curve of this new interaction. I encountered testers who, having spent countless hours hunting-and-pecking over Xbox Live and whatnot, were initially quite frustrated by this system. But even in their ire, they were nevertheless actually typing faster than they had been.
Iteration 4: +Quadrants
Far and away the most common errors were in the corners, such as NNE and NE. My anecdotal suspicion is that gamers are used to gross input, be it little taps to line up sniper sights or the directional inputs for fighting game power-moves. Therefore I wanted to accommodate an input gesture less precise but more reliably repeatable.
My solution for this was to divide each split-keyboard half into quadrants.
The user would first choose a quadrant by pressing in one of the cardinal directions and then (optionally) turn to another direction to select the key within the quadrant.
Pressing right on the left stick would select the “FTGB” quadrant, and moving up would select “T”; moving down, “B”, pressing all the way to the right would select “G”, and a stick position not on the right edge of the joystick would be “F”.
Therefore typing “B” would reliably be one motion consisting of flicking the left stick right, then down. In practice, that basically feels like rubbing just below the center of the right side of the stick edge, and it’s endlessly reliably repeatable.
Fighting the quadrants can be frustrating, however. If you accidentally select the right quadrant and want to get down to the bottom, sliding down from the right edge won’t work; the stick needs to be relaxed, or the cursor needs to otherwise return to the center first. Therefore some users, whose grip on the gamepad sticks is lighter and more precise, will prefer iteration 3.
That said, my limited test group performed best on the quadrant keyboard. And here’s the split quadrant elastic dual stick on-screen keyboard (SQuEDSOSK) in context:
(You can see how I've taken steps to keep the on-screen keyboards from occluding too much -- more work to be done, there, to be sure.)
Try it out!
If you'd like to try Slashcards: Learn Japanese, you can give the pre-release preview a spin by downloading a build from this itch.io page: https://bigblueboo.itch.io/slashcards-learn-japanese.
And why stop at English? If you're interested, keep an eye out for a part 3, where I'll talk about the development of the Japanese-specific kana keyboard.