Sony Computer Entertainment demonstrated its facial recognition library for the PlayStation Eye libface at the Yokohama-based 2009 CESA Developers Conference (CEDEC), explaining the basic concept of photo recognition and processing technology and how it is applied onto the library.
Photo processing in general are used in retouching or editing a photo on software such as Adobe Photoshop. However, in video games, they are generally used to find certain characteristics in a photo.
In the demonstration, they showed a close-up photo of a red flower in an open field. Using a convolution filter, they showed how an image can be softened (low-pass), sharpened (high-pass) or detect edges (sobel/laplacian) and a pyramid filter was also shown to demonstrate noise reduction.
Next, they showed three images and ran a comparison match to see which two photos were identical. If they match, the screen would essentially all turn black giving a value of 0. Now this is all just talking about still images. All of the above are utilized in tracking movement, a girl stood in front of the Playstation Eye moving her hands.
Now imagine a sequence of photos as frames from each fraction of a second. It would then run a comparison match in each frame and the difference would then appear on the screen, while the immobile portions would be all black. You can then use pyramid and convolution filters to reduce noise or create conditions for edge detection.
In addition to comparison match, pattern matching can also be integrated. In each frame, it would perform a local search and try to find the same pattern within the area. This time a circular mascot was added on the application. The girl would now use her hands and arms to hold this mascot and carry it around. In the demo, a close-up shot of the mascot was shown on a different part of the monitor to see how it performs a local search.
As for facial recognition, the technology may be familiar to many as it has now become a feature in most digital cameras. The process of facial recognition can be divided into four steps. The first, which also takes the most amount of processing time, is face detection. A detection box of at least 20x20 pixel would sweep across the entire space.
It would detect based on age, nationality, as well as the orientation and direction of the face in the frame. There are no limitations as to how many faces it can detect, however, this would also mean more processing time. The second step would be to find face parts on each individual's face, usually in four parts: the left and right eye, nose and mouth.
The third step would be the alignment, it would look for 50 different points for details and characteristics of the individual's face. Lastly, both the attribute (i.e. specific age, facial expressions, etc.) and face recognition are processed. In order to speed up the face detection process, reducing the scale of the photo and performing a local search or an overall search using tracking, block matching, background subtraction and other methods.
With libface, one or more SPU can be used in face processing. For example, setting the parameter to 47 pixels would take 58 milliseconds with one SPU. With multiple SPUs, the process can be sped up and reduced to 15 milliseconds.
Or you can set the parameter to 80 pixels to easily capture more faces within the space. Changing the parameter allows you better detection also in regards to the distance between the user and the PS Eye.
Some applications such as avatar-linked facial recognition and pattern recognition were briefly shown. With the former, it would find the alignment of each user's face, so if the user smiles, the avatar on screen will also smile likewise.
The latter is used in mini-games such as "Smile Competition". Users would smile in front of the camera and whoever scores higher wins the match. Although the application focuses on facial recognition, this can also be used in detecting specific objects through higher level detection algorithms.