Yoot Saito is someone who makes a point of doing things differently. Whether it's the pinball-cross-tactics gameplay of Odama or the gross-but-lovable voice-activated virtual pet in Seaman, Saito's intent has been to surprise players with new kinds of interactions.
Committing your career to genre-bending games is certainly not the easy route. This summer marks 20 years after the original Japanese release of Seaman on Sega's venerated Dreamcast. The game implemented a microphone peripheral, through which players interacted with an odd and often grouchy sea-dwelling creature that had the face and voice of a man.
Despite its weirdness, and the marketing challenges therein, Seaman was a big seller, moving over one million units across Dreamcast and PlayStation 2 versions.
Two decades after the game's release, Saito answers our questions about Seaman, reminiscing about the successes and difficulties of making such a strange game, and explaining his propensity for leaning in a different direction than other game designers.
What’s your process for coming up with new, strange ideas?
My creative process for new games that I’m passionate about creating is fairly simple in nature. I often use objects/things/places that most traditional game creators wouldn’t consider using. With Tower, you see one side of a skyscraper from above and observe the people in it almost like you are looking at an ant farm. With Seaman, the base idea was what if my pet at home could talk. What would they say? In Odama I wanted to see how it would feel to command a force of thousands of soldiers as a general. So basically, I wanted people to feel what it would like to feel something they couldn’t normally experience using games as an artform for conveying that feeling.
Also, I don’t like to paint within the lines of a pre-existing genre like RPGs, treasure-hunting, shooting, etc. I want to make my own thing. That is an incredibly motivating part of making games for me. However, it takes a lot of energy to craft and present something “new” to people. It always takes longer than you expect to make something truly original and there are lots of pitfalls along the way that you never expect. And once you are in the unknown there is not one-size-fits-all solution to those sort of problems. It requires a lot of time and energy from a producer to solve those issues and the risk is quite large but that’s what makes it worth doing.
At any point did you feel creatively lost or stuck during the development of Seaman? If so, what did you get stuck on?
When creating Seaman, there is one creative wall I came up against that really sticks out in my memory. Right before finalizing the game we did a test play run with some random people and they all ended up saying things that I never thought they would in that specific situation or part of the conversation. That was right when we were about to being the Seaman publicity cycle. The Sega marketing rep held an event at an aquarium in Tokyo and asked people to try talking to the fish in the aquarium. That was the experiment.
The people that participated turned towards the microphone but didn’t know what to say so they did what people constantly do in those situations. They babbled on and on into a very long sentence. I didn’t build Seaman out to understand long conversations so he was unable to understand what was being said. So he would revert to the standard word pathing of “can you say that again” or “Huh, did you say something?” effectively asking a question to the people. After multiple times of Seaman not understanding the conversation and asking the participants to repeat themselves they started to get upset or just didn’t like it and went home. The things people were saying was so different from what I had designed Seaman to do that the team were totally distraught.
The folks that visited the aquarium didn’t know what to say into the microphone so they started saying things like “why is this freaking looking fish thing here in the first place” and ultimately created overly long sentences that Seaman couldn’t understand. Since we didn’t anticipate that, Seaman was stuck in a “Can you say that again?” loop.
To make matters worse, the game was going to be released soon thereafter. After thinking about a wide variety of solutions I decided to solve the problem through human behavior and understanding rather than the logical designs of a computer program. So basically Seaman would say things like “You talk too long, I don’t understand” and “If you don’t word it more simply/shortly, I don’t want to talk to you” and complain to the end-user. If they didn’t follow those instructions, Seaman would swim away into the back of the aquarium.
This idea ended up being one of the things people really liked about the game. As soon as the end-users realized they couldn’t use long sentences, they began talking to Seaman as if was a baby saying simple and easy-to-understand phrases like "hello" and "sorry." Even better was the fact that they spoke slowly and clearly. This lead to a new style of game within Sega and helped us avoid the image that voice-recognition games were bad and just didn’t work. On the other hand, Seaman quickly became known as creature that was pretty selfish and hard to connect with. (laughs)
What was it like working on the Dreamcast in the 90s?
I think you can sum up everything about the Dreamcast when it made it’s debut in the 90s as an incredibly unique experience. Sega was chasing after their rivals Sony and Nintendo as best they could. However, people at Sega loved doing “fun” and “interesting” things. So instead of playing it safe and being conservative, they went on the attack and did as many creative and crazy things as they could. That’s the sort of mindset that allowed a game so unique and unusual as Seaman to be born. They got behind it and pushed hard for it on the promotional front allowing it to become quite the hit.
So for me, the Dreamcast is a great game console with many unique features but more than that, it’s reminds me of those days when Sega was doing so many creative and wonderful things.
Why did you decide Seaman would be so blunt and oftentimes, rude?
There are two basic reasons to why Seaman is such a rude character. The first is that I was tired of cutesy characters and wanted to do something really different. The second is the reason I previously mentioned, that type of personality helped increase the chances of Seaman understanding what the end-user was saying.
How did it make you feel when, while in development, people would tell you the concept was "gross" or "creepy"?
I wanted Seaman to be weird and gross so I had no problem with people saying so. That was by design. In order to make Seaman feel original and to have truly unique characteristics there are things that Seaman had to do to really stand out:
- NOT cute
- Look into the end-users’ world though the TV
- Focus on the real world instead of some fantasy world
I felt that if I could properly achieve these three points, I’d have something no other game had and people to present a truly unique experience. That was the goal and what I really wanted to do as a creator. It’s still the inner fire that pushes me to do unique and original things.
Do you think big game companies have become too "safe" since 1999, and do you think a major publisher like Sega would publish a game like Seaman here, 20 years later?
Recently the game industry, especially the consumer game industry, involves projects that costs a huge amount of capital and therefore have shifted to a Hollywood movie business model where you have to place safe bets such as sequels to survive. It’s probably the only way to be able to recoup such a huge amount of up-front dev risk. I do hope that the equivalent of what “The Blair Witch project” did for movies will happen in games.
I know that Sega is a company full of people that like to do interesting things. And there are lots of new people who continue to carry that spirit on. So if another game design like Seaman comes along I’m sure Sega would be the type of company to try their best to make it a success.
How much additional pressure was there on you to make Seaman a hit, as Sega was competing with PlayStation? What was that like?
No, I don’t think that Sega felt pressure from the fact that PlayStation had taken an early lead? Or at least on the creator side of things in the studio we were merely focused on how to do the most creative things with the new hardware. Of course, I’m sure the sales team and upper management felt very differently and did in fact that a lot of pressure but the creative side were just like a bunch of children excited to play with their newest toy.
What was your process for designing conversations in Seaman? What tools did you use, and what were your guiding principles when writing the script?
The entire conversation structure and design was handled be only me. That’s because most of the conversation is based on my daily observations on life. And actually, it’s my voice in the Japanese version as Seaman. So it was easy for me to be “in-character” when playing as Seaman.
If we had multiple people build out Seaman’s character, conversation, and thoughts, Seaman would have been all of the place. So that basically means Seaman is me. And although that’s a fairly simple character design, I think that simplicity allowed us to make our deadline even when things got difficult. When building out the script, there were some rules. If Seaman was too harsh with this words, it would have been insulting or disappointed the end-user. So we made Seaman say things that an sour yet loveable old lady might say. They may be harsh words but you can tell there is a warmth and love behind them to the point where eventually you appreciate that tonality.
When it was localized by Sega America, I merely described the high-level concept to Sega of America and they ran with it. I have no idea whether those “rules” are conveyed in the English version or not.
With all the voice tech available today, why aren’t voice-centric games more popular (or perhaps the question is, why aren’t more devs making them?
I think this is a really good question so allow me to take the liberty of going deep down a rabbit hole to answer it. In games involving voice recognition, there are two main parts. One is direction and the other is navigation.
Direction is used as a form of leading a player down the scenario path you have set up and any type of word can be a type of direction device. Navigation on the other hand only occurs when the player inputs a pre-determined command or input such as “left” or “forward”. So Seaman is a game based on navigation. So basically, once the player inputs the necessary command or input that the creator has pre-determined, you’ll “jump” to the next part of the conversation or dialogue.
Saito's Seaman concepts
It’s pretty simple design. But in order to create the feeling of reality, we needed to create 40,000 different types of Seamen. Seaman isn’t really understanding what each player is saying. It’s just that we have prepared that many different answers for the wide variety of possible voiced answers/conversations a player may say and then jump to the naturally correct continuation of dialogue based on that system which makes it seem like Seaman understands.
Actually, games are a little like languages. In card games you have limited cards that you can play in response to another player. However, human communication doesn’t have the same limits as a card game or an RPG. Communication can be pretty expansive. Trying to create a card game with that many possibilities would be extremely challenging for a person…near impossible really. So we have no other choice than to wait for AI to help bridge that impossible gap.
However, even if AI based communication gets better and better, I have doubts about it being a good base for a game. Basically, games are fairly simple and straight-forward. The more complicated they get, the harder it is to create proper "win-lose conditions." That’s the core reason why that kind of a game is challenging. By the way, I am actually creating an AI engine based on Japanese language and words.
What kind of 2019 tech would you use to make a new Seaman-style today?
In order to create a new Seaman in this exact day and age there is one absolutely essential technology that we need. That is an engine that is capable of organically continuing a conversation without needing a script. You would call this AI of course.
The original Seaman required all of the script and conversation to be pre-recorded as a voice file by a voice actor. And once you’ve heard all the voice, there’s nothing left to do. Basically the game has what you would call an “ending”. However, when you consider what you can already do now with everyone carrying around a mobile phone, you’d expect Seaman to be able to observe and comment on the user’s current activity and constantly comment on that…almost infinitely. That’s why you need an AI engine at this point to be able to achieve that kind of design. It’s also why I went out to gather capital to start creating one such engine and continue to work on it.
So my focus and interest is no longer games but rather a semi-permanent way to continue to speak Japanese or rather an engine that can break apart the incredibly complicated and nuanced language that is Japanese and then re-combine it into meaningful conversational responses. If I can achieve that goal, I’d imagine other complicated languages could be translated into an AI engine as well.
The Dreamcast microphone peripheral
What did you learn from seeing how people interacted with Seaman?
Yeah, I learned very important lessons. You got insight about people through a sample size of over a million and watch as the conversations developed. It’s been a very important lesson for me. One of the biggest discoveries was that as all those people were pointed at the mic and doing their best to find a conversational connection with Seaman, most of them weren’t speaking grammatically correct language. It made me feel that if you rely on textbook rules to create voice recognition AI, you’ll never be able to capture the organic flow of real-world dialogue and without that you’d never be able to create a proper engine that can achieve that.
And that is where I had another big discovery—melody. That communication isn’t only about words…melody plays a significant role and that it’s potentially even more important than following exact word order and grammatical structure. And actually that’s the way it is in Japanese already. Word order is very loose. Japanese is like Yoda speaking (laughs). So I realized creating an engine that could recognize melody would be a more efficient way to solve the AI voice recognition problem than trying to adhere to sentence structure. So at my "Seaman AI Research Lab," I’m building out a language recognition engine based on what I call "melody language."
What’s your fondest memory of the development of Seaman?
When the English version of Seaman was released, I was very curious about how it would be received. When Leonard Nimoy introduces himself in the game using his real name, seeing all the end-users smile was really great, because we achieved the same thing in the Japanese version. I wonder why that is? Maybe because having an actor introduce a different character but saying their real name was somewhat fourth-wall-breaking and just something that you’d expect from a drama and not a game. It was exactly like one of the characteristics I wanted – for Seaman to speak to the end-user from inside the TV, looking out into the world. So the atypical, weird, and gross Seaman ended up breaking all the right rules and tropes in both English and Japanese!