[A researcher shares the secrets of psychophysiological research into players' mental states, outlining several possible techniques and both the pitfalls and potential data that can be gained from applying these in a game test environment.]
As game research and testing develops, there has been an increasing interest different methodologies for assessing games and gameplay. One such area is the use of psychophysiological measures, such as heart rate or electrodermal activity, to assess players' engagement and emotional response.
This article will discuss several of the main measures currently widely used in psychophysiology and their advantages and disadvantages as well as a general discussion of the usefulness of psychophysiological measures.
What is Psychophysiology?
Psychophysiology is a method for studying the signals provided by the body in an attempt to gain insight and understanding into what psychological processes are underlying or related to those body signals.
In other words. it is using the human body to answer the question "Whatcha thinkin'?" In particular for game research and testing, it can be useful for assessing emotion and mental workload.
Also, since psychophysiology offers game testers and researchers access to data from players without having to go through subjective channels, such as those provided by questionnaires, it also offers a somewhat unbiased assessment of player reactions.
However, since psychophysiology signals do often require quite a bit of interpretation, there is still plenty of room for observation (interpretation) bias on the game tester or researchers' behalf.
One big advantage that psychophysiological measures do have is that they offer access to emotions and body signals that players themselves may be unaware of, and what's more, they can be recorded automatically and continuously without stopping or pausing gameplay.
What Does Psychophysiology Measure?
Before getting into the specifics of the most popular psychological measurements, it is important to quickly cover some basic biological ground. First of all, psychophysiology relies on a view of cognition that asserts human cognition as arising from the whole body. This is called "embodied cognition", and compared to alternate views of cognition, does not propose some kind of separate "mind" where thinking and feeling occurs. Embodied cognition, rather, assumes that our cognitions and feelings arise from a wide system of bodily reactions, and are not simply confined to our brain, for example.
In other words, our cognition is affected by, reflected in, and in fact arises out of our whole body. Now, to use psychophysiological methods you do not necessarily have to buy into this view of embodied cognition, but it is necessary to understand that this is the assumption from which psychophysiology hangs.
Given a view of cognition as embodied, it is important that the body has a good way to communicate, manage and maintain itself. This is handled through the nervous system, which can be split into two parts; the first is the central nervous system, made up of the brain and the very top of the spine. This is the executive control system of the body. As such it is well-protected, and quite difficult to access if you want to measure what is going on in there.
The CNS is much easier to access if you are a demigod.
The other part of the nervous system, the peripheral nervous system, comes out from the spine and handles the day-to-day running of the rest of the body. This means it is much easier for us to access and get measurements of, and it is with this system that most widely-used psychophysiological measures interact. The peripheral nervous system is in turn split into two parts: the parasympathetic system, which handles the general maintenance of the body and relaxation, and the sympathetic system, which is more for handling emergency reactions and excitement.
This means that the peripheral nervous system can be used to measure emotion. In particular, given the typical two-dimension view of emotion, it is particularly useful for measuring Arousal (High, generally sympathetic activity versus Low, generally parasympathetic activity) but is less useful when it comes to Valence (Pleasant emotions versus Unpleasant emotions).
The familiar two-dimension view of emotion.
An Aside: Feelings and Emotions
As an aside, it is worth noting that there is a growing body of evidence in psychology that has lead to a view of feelings and emotions as two different things, despite their generally interchangeable use in everyday language.
This view states that emotions are the body states we have that psychophysiology can detect, for example an increased heart rate meaning excitement, whereas feelings are the conscious perception of the emotional states. That is to say, feelings are when you feel an emotion. The upshot of this is that it may be possible to have an emotion, but not feel it -- yet still have it affect your behavior in some way.
Or to put it in terms of game testing, a player's emotion can be measured through the heart rate monitor, but only when you provide them with a questionnaire (or another subjective measure) can you be sure that you are tapping what that player feels. As mentioned earlier, this means that you may be able to get information from psychophysiology that players themselves do not necessarily consciously have access to.
Specific Psychophysiological Measures
When it comes to trying to access the central nervous system, EEG is one of the easiest measurement tools for game testing and research to turn to. This is because unlike higher resolution PET scans or fMRI measurement, EEG does not require participants to be placed, lying down and still, in large, expensive (and magnetic) machinery.
Rather, EEG operates through the use of electrodes on a player's skull that measure the electrical impulses generated by the brain.
EEG setups range from full electrode caps, which take an hour or more to attach and are capable of measuring specific activations in certain brain regions, to relatively simple headbands which are capable of only general brain wave analysis.
Thankfully, when it comes to game research, this latter less intrusive and expensive form of EEG does provide quite workable measures of engagement and emotion by measuring various different frequencies of brain activity (or brain waves).
In terms of these frequencies, the bands of interest are usually the:
- Alpha band (8-14 hz) that reflects calm, mental work.
- Beta band (14-30 hz) that reflects focused, engaged mental work
- Delta band (1-4 hz) that reflects sleep, relaxation and fatigue
- Theta band (4-8 hz) that reflects emotions and sensations
So if someone is playing a game and EEG records an increase in Beta wave function, then you can assume that the player is actively engaged in some kind of mental work.
The Star Wars Force Trainer relies on reading Beta waves via a simple EEG setup.
However, there are several disadvantages to EEG. The first is that it is relatively expensive compared to other measures, especially if you want to go for the full electrode cap, and is quite time-consuming and invasive to set up and use. For example, with a full electrode cap setup, each electrode must be very specifically placed, with the addition of conducting gel, which is usually applied using a needle to ensure good coverage. While the needle is not used to pierce the skin, it can still be somewhat unpleasant and I have known of participants with aversions to needles to even faint during this process.
Furthermore, as with all of these measures, EEG is somewhat prone to producing artifacts if players move too much or speak (speaking activates areas of the brain, of course). Another commonalty for all of the measures I will mention is that there are considerable individual differences in psychophysiology, which means that baseline measures must always be taken. This is especially important for EEG, since some individuals do not produce any activity in the Alpha band at all (but are otherwise normal).
Finally, EEG can be difficult to interpret. For example, if you detect increased Delta activity, it could be that your game is relaxing and therefore enjoyable. On the other hand, it may be that it is boring and tiring. Similarly increased Beta band activity may indicate your game is engaging, or perhaps that the player is disengaged and thinking about a particularly hard day they had at work.
EMG is all about detecting the activation of muscles through the use of electrodes, which are attached to the relevant muscle (or muscles). So again, like EEG, (and like most of the measures I am mentioning) this method relies on detecting electric current. However, unlike EEG, EMG is a direct indication of activation in the peripheral nervous system.
EMG can be applied to basically any muscle -- for example, the muscles of the upper back could be examined to test tension or stress. But of particular interest in game research and testing is generally facial EMG. This is where electrodes are attached to specific facial muscles that are sad to be related to negative or positive emotional reactions.
Specifically these are muscles in the:
- Brow (Corrugator supercilii) that register negative emotion (unpleasant valence)
- Cheeks (Zygomaticus major) that register positive emotion (pleasant valence)
- Area around the eyes (Orbicularis oculi) that are said to register expressions of enjoyment and "genuine pleasure" (whatever that is)
This makes facial EMG one of the few physiological measures that can actually tap the valence axis of the typical two-axis view of emotions. Furthermore, the sensitivity of facial EMG means that changes in these muscles that could otherwise be missed from direct observation, or facial analysis software, can be detected and used.
Working all three muscles.
However, once again EMG has its disadvantages. First of all, you still have to deal with individual differences, so baselines are required (although this is less of a problem with EMG as it is with other measures). Then there is the intrusive nature of electrodes being on a player's face, combined with wires hanging off them.
This may actually limit natural movement of the face, and since it gives and indication to players that their facial muscles are being recorded, they may themselves produce unnatural responses -- perhaps overemphasizing facial movements in an attempt to assist your data collection (something that may even occur subconsciously).
Finally, and this will become somewhat of a broken record (or a corrupt mp3 player), but there is always the possibility of artifacts in your data caused by electrical interference or non-target events -- excessive body movement for example, or your players talking.
Electrodermal Activity (EDA)
Electrodermal Activity (EDA) is also known as Galvanic Skin Response or Skin Conductance and, as the "electro" prefix hopefully gives away, is related to measuring changes in electric current on the skin. Specifically, changes in electric current caused by the activation of sweat glands.
EDA is typically taken by recorded electrodes to two fingers (or toes) and are probably most famous for their use in "lie detector" tests. Since it does only use two electrodes, this means that EDA is less expensive and somewhat easier to set up than other physiological methods. Although care should be taken that the digits to which the electrodes are attached are not moved much during data recording -- something that can obviously be a problem if controllers have to be manipulated easily.
In terms of what EDA measures, it is seen as reacting to emotional arousal and mental workload, and gives very distinctive "spikes" in response to emotional stimuli and workload. This means that EDA can be handy for at looking at specific events during gameplay -- although it can also be averaged over time and examined.
A single EDA response, showing latency, the response, and the recovery period.
However, as the graph above shows, there can be quite a time between a game event occurring and the EDA response -- usually between one to five seconds. There is also a recovery period in EDA that must pass before any further response can be registered. This is of course a problem if you have lots of events going on in your game, as some may be missed, and due to the time lag it may not be clear exactly what an individual EDA is in response to.
Furthermore, EDA is quite a sensitive and noisy signal, which means that it suffers from specificity problems -- in other words an increase in EDA may be because a player was talking, moving too much, is engaged in your game, or is thinking about how cute you are (or a combination of these factors).
Cardiovascular measures are related to your heart, and are mainly about looking at rhythms and how they change. These measures are heart rate, inter beat interval, heart rate variability, and blood pressure.
The first three measurements are again all about measuring an electrical signal, and are measured by attaching electrodes to the chest. Although nowadays there are many heart rate monitoring belts and high-techT-shirts which are designed for athletes (available the commercial fitness market) which can easily be used as well.
Blood pressure on the other hand requires a cuff to measure, much like at your doctor's office, and does not involve a measurement of electric current.
To go into a bit more detail, heart rate is the number of heat beats you have per unit of time (say per minute), and this will typically initially increase with increased workload and emotional arousal.
Inter beat interval is basically the same measure as heart rate, but refers to the time between beats and therefore tends to initially decrease with increased effort and emotional arousal. This is because there are more beats (heart rate is increasing) therefore the time that passes between those beats is shorter (so inter beat interval decreases).
With relatively cheap and easy heart rate measurement in every sporting goods store and gym, we have come a long way from this (image from Wikipedia).
Heart rate variability is a little more completed in that it is not measured directly, but is rather the variability of the inter beat interval. In other words, it is derived from measurements of inter beat interval over time. This again initially decreases with increased effort and emotional arousal and it is generally seen as a more sensitive measure to changes in workload than inter beat interval and heart rate -- especially if the 10hz frequency band is examined.
However, it is also quite vulnerable to any artifacts in the underlying data source, and can be relatively complex to calculate. That said, if you are recording inter beat interval then you should also attempt to calculate heart rate variability as it increases the richness of your data. It is therefore important that whatever device you use does provide you with information on inter beat interval, something that some of the mass market exercise belts do not do.
One of the great things about cardiovascular measures is that you can often see changes in variables with the naked eye. For example here you can clearly see an increase in heart rate and a decrease in inter beat interval when moving from a resting to a task completion state.
The change in heart rate and inter beat interval can be clearly seen when the task begins.
Finally, blood pressure simply measures the pressure that your blood is under, and therefore how hard your heart is working. This also initially increases with arousal but it is a less common measurement due it is intrusiveness in that you need to have a cuff to restrict blood flow, and then release it again.
There are of course some caveats to using cardiovascular measures. The first is that I have said that the changes above are initial reactions. This is because after about 20 minutes or so on a task, the trends I have described above tend to reverse due to the body's natural defense mechanisms attempting to return the body state to normal. This means that after long periods of emotional arousal or workload, heart rate actually starts to decrease and inter beat interval increases. This can cause problems with your data if you are not aware of it.
The other two problems are ones I have mentioned previously with other measures, and they are of specificity and individual differences. Some people may have particularly high or low natural heart rates, or irregular rhythms -- again this can be overcome through the use of individual resting measurements which are then used as comparison points for each individual.
As an aside, cardiovascular measures can also sometimes get you in a bit of a stressful situation, where you may detect irregular rhythms or particularly high blood pressure in your participants which may indicate underlying heart problems that they themselves may not be aware of. You have to use your own judgment when dealing with this. However, I would personally lean towards calmly advising any such participant that it is probably nothing (some people just have irregular patterns) but suggesting that they may want to have a checkup at their doctor's office.
Finally there are many things that can increase or decrease cardiovascular measures -- yes, perhaps your game is really exciting and demanding at his exact moment, but it is also possible that your player just took a deep breath or yawned (which causes your heart rate to increase as to distribute the oxygen in that breath around the body). Artifacts are also a problem, so again, no talking, and be careful that there isn't interference from excessive moment or other electrical sources.
The final measure I want to discuss is respiration. Respiration refers to the measurement of breathing, and is actually part of the cardiovascular system. However, I want to mention it separately because it is an often-overlooked metric. This is probably due the fact that it is not as sensitive as some of the others mentioned here. However, it is relatively easy and cheap to measure, requiring only a respiration belt (or one of the high-tech T-shirts used to measure information from the heart will also work).
Although it should be noted that respiration belts (and the belts used to measure heart rate and inter beat interval) work best when next to your skin on your chest. Therefore, even though respiration belts and heart rate measuring belts are easy to use, they are somewhat intrusive in terms of people often having to show experimenters at least part of their bare upper bodies – a situation can particularly be an issue if the player being tested is of the opposite sex to the tester.
However, respiration is also important to measure because breathing has a strong effect on other physiological measure such as EDA and the cardiovascular measures. This means that if these other measures are being used then respiration should also be recorded, if only to control for its effect.
In terms of gaming, the general effect of workload on respiration is to cause an increase in respiration. However engagement in a game or activity can also cause periods where players breath is held as this is a natural preparatory action as part of the "fight or flight" response setup by the sympathetic nervous system -- certainly an ex-girlfriend of mine used to shout at me to "remember to breathe!" whenever I would play particularly tense games.
Respiration is of course also vulnerable to artifacts (again no talking or excessive movement) although because it is not as sensitive as other measures, and is not measuring an electrical current, this somewhat less of a problem.
Issues Related to, And Benefits of, Using Psychophysiological Measures
There are of other measures I could have covered, such as recording blood cortisol, pupil dilation, skin temperature, or eye tracking, but the measures I have covered are probably some of the most common ones used in psychophysiological research. Finally, though, I would like to just wrap up by summarizing the benefits and issues related to the use of psychophysiological measures.
Because I don't want to end on too much of a down note, I will start with the issues related to the use of psychophysiological measures. There are a few of these, and you may have picked them up already but they are worth repeating.
The first is problem of inference -- this is related to working out exactly what the psychophysiological measurement you have taken actually means. This is problematic because there is a many-to-one (or one-to-many) relationship between most cognitive states and physiological responses.
In other words, an increase in heart rate, for example, can be caused by many different factors and may not be related to gameplay experience at all. The upshot of this is that you, as the researcher, must infer what a psychophysiological measurement means.
This problem can be somewhat overcome by not using psychophysiological measurements in isolation. Rather, they should be used alongside subjective questionnaires and objective game data metrics.
For example, if you find an increase in heart rate, an increase in facial EMG in the cheeks, and an increase in ratings of fun on the Game Evaluation Questionnaire in response to your game mechanic, then you are on much stronger ground than if you had just one of those measures.
However, talk-as-you-play type subjective measures should obviously be avoided as they will produce artifacts in your data. This combination of measures also helps with detection of artifacts, by highlighting conflicts where players may be reporting that they are having fun, but psychological measures may not be reacting or are negative, which could indicate a potential measurement problem.
The next two issues are those of specificity and generality. These factors are related to inference, in that psychophysiological measurements are sensitive to many different things, but also change across individuals, situations, tasks and times. Always taking baseline measurements and comparing within subjects, rather than between, can address the problem with individual differences.
However, more seriously, these issues also potentially mean that the results you see in your test environment may be quite different in a home gaming setup or for the next game you make (even if it is a sequel to your last one that uses many of the same mechanics).
Again, this raises the importance of correct inference and the use of psychophysiological measures to compliment other subjective and objective measurements. Also as research in psychophysiology and games advances its possible that the connection between certain gameplay elements and certain reactions in physiology may become clearer.
The fourth issue is also somewhat of a benefit. This is that many psychological measures are good at detecting the workload a player is under, and their emotional arousal. But, with the exception of facial EMG, they are not particularly useful for detecting the pleasantness of a player's emotion (valence). This means that you may know that your players are experiencing an intense emotion, but not know if that is a good thing or not.
The final three issues are that of expense, artefacts, and intrusiveness. While the price of much of the technology used to measure psychophysiology is decreasing, it does still typically require specialized equipment and software, and perhaps more seriously given the time limits that often exist in game development can also cost a lot of time to both setup and analyze. Furthermore as mentioned many times above there is always the potential for artifacts in your data -- which could lead to your data being biased or the masking of useful effects.
Sadly, not this type of artifact.
Finally wiring someone up is a somewhat intrusive thing to do and with all of the measures mentioned above extensive movement and talking should be avoided as they produce artifacts. Or if they cannot be avoided, they need to be noted down and recorded in order to be controlled for. Being asked to hold your fingers still if so EDA can be collected is also intrusive in its own way, although thankfully people do quite quickly adapt to being wired up and can often put it pretty much out of their mind.
Also, as time goes on, the technology for measuring psychophysiology is also advancing and its intrusiveness is decreasing -- for example I have heard talk of next generation heart rate measuring sensors that do not have to be attached to the body and can simply be placed in a chair to collect data from anyone who sits down.
This article may seem a little too negative. This is mainly because I want to make it clear to anyone reading that psychophysiological methods are not a silver bullet, even if they do provide nice objective quantifiable data.
However, to finish up, I would like to again note some of the main benefits of psychophysiological methods. First of all, psychophysiological measures can be fully automated in terms of their recording, and can also be pretty much recorded continuously. This means they can be directly related to gameplay events as they happen. This is quite a big advantage over many subjective measures that require you to either stop and start a play experience, add additional load to a gameplay experience, or wait until the end before you can collect data.
Secondly, the other big benefit of psychophysiological measures is that they detect emotions or reactions in players that they themselves may not be aware are present (or detect them before they enter awareness). This can be a great help, especially if players are having problems expressing exactly why they dislike or like a particular feature.
Although, it must be also asked, if players are unaware of these emotions then can they influence their behaviour and their enjoyment of the game? The science around this is currently unclear, although there is a growing body of psychological research that does suggest that unconscious body states (emotions) can have a meaningful impact on human behavior.