For the next part in Gamasutra’s ‘Tooling Around’ feature, which profiles and interviews middleware and tools developers about their products, today’s interview is with Marie-France Caouette, VP of client relations at Di-O-Matic, developer of 3D lip-syncing tool Voice-O-Matic
The product is a modeling plug-in designed to automate the lip synchronization process. While it is only available for 3ds Max at this point in time, the Montreal based company has revealed to Gamasutra that Maya and XSI compatible versions are in development. Caouette explains that using Voice-O-Matic is simply a case of importing a recorded audio file into the plug-in, and “its powerful algorithms will automatically generate timing and lip position data”. The plug-in analyzes “the phonemes within an audio file and automatically creates standard key frames, allowing animators to quickly tweak the results”
We spoke to Caouette recently, and asked about Di-O-Matic, the evolution of Voice-O-Matic, and its use in the industry.
When and why was Di-O-Matic formed?
Di-O-Matic, Inc was founded in October 2000 by Laurent M. Abecassis, and is conveniently located in Montreal, where many CG companies like Softimage, Discreet Logic, Matrox, Kaydara and ToonBoom are based. After spending years in production, Laurent realized that many artists, animators and technical directors needed tools to simplify, accelerate and improve their character animations.
What were the aims and goals of the company at this time?
The goal of the company is still the same today: to develop animation technologies to help artists bring CG characters to life more easily.
How did you realise the need for a product like Voice-O-Matic?
One of the first products we released was Facial Studio, aimed at automating the modeling of 3D heads. It is fully compatible with almost all 3D applications through our exporter and allows one to use photos to rapidly create a fully textured head with morph targets ready for animation.
It is used by many game developers to recreate actors or sports team members from photos. It was a logical development move for Di-O-Matic to get into the automation of lip synchronization after having worked at automating the facial modeling and texturing process.
What was the development time on the product, and what challenges did you run into in preparing the product for industry use?
One of the key development targets we’ve set when we started our initial research efforts on Voice-O-Matic back in 2002 was to be able to fit almost any type of character setup and to not be limited to a specific language, nor a specific amount of mouth positions. As every character is unique, we wanted Voice-O-Matic to be able to work with virtually any characters in any languages. We didn’t want specific requirements for a character to work with Voice-O-Matic.
How has it developed over the time you've been producing it?
We started developing Voice-O-Matic in 2002; we released the first version back in 2003 -- since then we have never stopped improving the tool.
How have you acted on feedback to improve Voice-O-Matic?
Acting on customer feedback is at the core of the development effort of Di-O-Matic. After releasing the first version of Voice-O-Matic in May 2003, we have received many feedback and requests from our clients to improve the lip-sync quality as well the setup approach of the characters to animate. These changes were successfully completed in V2 amongst significant other improvements. In order to push the limit of automated lip-sync to accommodate game developers, we also have released 2 other additional 3ds max plug-ins: Pose-O-Matic and vomBatch both extend what Voice-O-Matic offers.
Pose-O-Matic allows one to rig a complex bones-based facial setup through a slider-based interface, allowing animators to animate using either animation slider or directly manipulating the bones in 3D space. The key advantage is that Pose-O-Matic allows facial bones-based setup, which are generally preferred by game developers due to game engine requirements, to work perfectly with Voice-O-Matic allowing automatic lipsync to work on much more complex rigs than regular morph targets and to be fully exportable to a game engine.
vomBatch allows batch processing in Voice-O-Matic. Once you have a character already set up with Voice-O-Matic, you can actually batch process 1000s of audio files automatically with vomBatch. We have several games customers who actually created lip-sync animations for several characters using this method, for several languages, with minimal user input once the character was properly set up, exporting it directly to their game engine.
How does the product work on a technical level?
Voice-O-Matic truly works like having your own lip-sync assistant. Voice-O-Matic is not restricted by any means, as it simply works with any animatable Bezier tracks. Once you have your scene ready to create your lip sync animations using any animation techniques you wish to use - like any number of predefined morph targets or bones-based facial rig optionally rigged with Pose-O-Matic - you are ready to use Voice-O-Matic, since no special setup is required on your characters. The setup will take from five to 25 minutes depending on the amount of mouth positions as well as the languages you wish to use and the good news is that once it is done you will actually be able to re-use the setup any time you wish to do more lip-sync on that specific character.
On the technical side of thing in the case of the 3dsmax version of Voice-O-Matic, the core speech recognition algorithm of Voice-O-Matic is connected to a user interface fully developed in MAXScript, allowing experienced 3ds max scripters to easily modify the behavior of the tool to accommodate specific production needs.
How important was it for you to integrate multi-language support, and what challenges are there in doing this?
As mentioned above, it was a key development target that we defined when we started our initial researches. One of the biggest challenges we have been facing in doing multi-language support is to properly explain English phoneme pronunciation to a non English speaking person. Providing audio cue samples within the Voice-O-Matic interface has allowed many non English customers to improve the results, as they could actually hear what a specific phoneme sounds like.
What are some of the more notable examples of the product’s use?
CG characters from the DC comics franchise Justice League
; Batman, Superman and Wonder Woman have been using Voice-O-Matic, in addition to several games featuring Marvel’s Spiderman, Venom and Hulk as well and the critically acclaimed Rockstar game hit Bully
. Beside video games, Voice-O-Matic has also been used in many CG fields including full CG feature film, and TV series productions.
Who is currently using the product?
Leading productions studios around the globe rely on Voice-O-Matic to create lip-sync animations for their games because it truly support lip-sync in all languages. Amongst our customers in game development, we are happy to count Sega, Snowblind studios, Left Field, Gas Powered Games, Eurocom, Beenox, Activision, Treyarch, Blur studio, as well as Rockstar Vancouver and Rockstar North to name a few.
What do you see as the next evolution of Voice-O-Matic?
We would like to take this opportunity to announce exclusively to Gamasutra readers that we are currently in development of Voice-O-Matic for Maya and as well as for XSI. We invite you to visit our website
for more information.