Affective Benchmarking of Movies Based on the Physiological

analysis with a special focus made on the narrative coherency. In [10], a real-time .... progressive increase of the emotion strain until a last ultimate event can be ...
2MB taille 39 téléchargements 358 vues
Affective Benchmarking of Movies Based on the Physiological Responses of a Real Audience

Abstract—We propose here an objective study of the emotional impact of a movie on an audience. An affective benchmarking solution is introduced making use of a low-intrusive measurement of the well-known ElectroDermal Answer. A dedicated processing of this biosignal produces a time-variant and normalized affective signal related to the significant excitation variations of the audience. Besides the new methodology, the originality of this paper stems from the fact that this framework has been tested on a real audience during regular cinema shows and a film festival, for five different movies and a total of 128 audience members.

I.

I NTRODUCTION

Getting the affective state of an audience watching a movie may have many potential applications in a video content creation and distribution context. A movie director may be interested by such information to validate his artistic choices and/or adapt the editing to optimize the emotional flow. It may also help a content distributor select the best target countries or population. Nevertheless, obtaining the real-time emotional state of an audience in an objective and low-intrusive manner is not easy. One direct method would be to collect, as it is already done by some studios, a direct self-assessment of each viewer, minute per minute. This approach is clearly too intrusive, quite subjective and may bias the reporting by distracting the subject from the movie. Face analysis through an adapted camera could be an alternative, but the associated algorithms [1] may be very sensitive to the user environment (low lighting conditions) and not always adapted to natural “close-to-neutral” facial expressions. Another approach, adopted in this paper, is based on the recording of physiological signals. Many biological signals are known to vary with the emotional state of a subject. Some typical signals are the ElectroDermal Answer (EDA), the Heart Rate Variability (HRV) or the facial surface ElectroMyoGram (EMG). The correlation between such signals and the affective state is well-known in the literature [2], [3] and the devices to record such biodata are becoming more and more compact and non-obtrusive [4], [5]. Furthermore, if evaluating the interest of such signals in the context of movie viewing is not new [6], more and more recent works try to correlate more accurately the user’s emotion with the video content. For example, in [7], physiological signals (including EDA) are correlated with audio features extracted from the video stimulus. In [8], the authors developed a userindependent emotion recognition method to recover affective tags for videos using electroencephalogram (EEG), pupillary response and gaze distance. In [9], the EDA signal is correlated with the arousal axis in a context of affective film content analysis with a special focus made on the narrative coherency. In [10], a real-time approach of the emotional state detection is alternatively presented where the authors especially focus

on affective events (i.e. fast increase of the emotional arousal) during a video viewing and try to guess the associated valence using classification tools. However, as far as we know, such works do not really allow to answer the three following questions in the context of movie viewing: i) When did the audience significantly react ? ii) How much the audience react for each movie highlight ? and iii) What is the emotional impact of one movie compared to another ? In this paper we propose an original affective benchmarking solution for movies that may bring an answer to the three previous questions. It is based on a dedicated processing of the EDA that produces a time-variant and normalized affective signal (termed “Affective Profile”) reflecting the significant arousal variations of the whole audience over the time in a quantitative and comparable way. Secondly and for the first time, an evaluation of this framework on a real audience during regular cinema shows and film festival is performed for five different movies and a total of 128 audience members. The methodology as well as the obtained results are presented in the following. II.

A FFECTIVE B ENCHMARKING S OLUTION

A. EDA as an arousal sensor Getting the mean reaction of a whole audience over the time is very related to the study of the arousal fluctuations of this same audience during the show (cf. the bi-dimensional “Valence / Arousal” representation of the emotion [11]). The ElectroDermal Activity (EDA) measures the local variations of the skin electrical conductance in the palm area which is known to be highly correlated with the user affective arousal ([2], [12]). This signal embeds high-level human-centered information and might be used to provide a continuous and unconscious feedback about the level of excitation of the enduser. More specifically, each shift in the arousal flow of a given user is correlated with slow (∼ 2s) phasic changes in his/her EDA consisting in a fast increasing step until a maximum value (the higher the emotional impact is, the higher the peak is), followed by a slow return to the initial state. B. Individual Affective Profile For a given audience member, a time-variant and normalized affective signal, termed “Individual Affective Profile” (IAP) may be thus directly obtained from the EDA as described in Figure 1. After a first low-pass filtering to remove possible artifacts, the EDA is derivated and truncated to positive values in order to highlight the relevant phasic changes. The signal is then temporally filtered and sub-sampled using overlapping time window to obtain a time-resolution of 30s, sufficient to analyze the emotional flow and to take into account possible

Raw EDA signal

Low-pass filtering

S(t)

Numerical derivation

S’(t) Thresholding

S’+(t) Individual affective profile pn

Normalize p by its individual affective intensity xn

p[i]

Compute the mean value p[i] of S’+ on every time window Wi

Fig. 1. Description of the different steps of the algorithm to compute an Individual Affective Profile from the EDA.

delays in the user’s reaction during the movie. To remove the user-dependent part related to the amplitude of the EDA derivative (which may vary from one subject to another), the obtained signal, termed p, is normalized (area under the curve equal to one) so that it may be interpreted as a probability of reaction over the time for the current audience member. The resulting signal termed pn is called “Individual Affective Profile” whereas the normalization factor termed xn , is called “Individual Affective Intensity”. C. Mean Affective Profile However, taken alone, pn and xn only reflect the reaction of one given user. Such quantities are thus very related to the user’s sensitivity and may be also damaged by userspecific noises (motions artifacts or user’s external reactions, for instance). To obtain a more relevant information concerning the arousal variations of a global audience during a movie, a “Mean Affective Profile” (MAP) pn is computed by averaging the individual affective profiles of every audience member pin , 1 ≤ i ≤ N (where N is the size of the audience) and by scaling this quantity according to:

pn =

N N 1 X i xn X i pn where xn = x N i=1 N i=1 n

(1)

As a result, pn takes into account not only the arousal fluctuations of the audience over the time but also the “Mean Affective Intensity” (MAI) of those reactions, contained in the scaling factor xn . By observing the variations of pn , one can compare different moments of a movie in an affective way and, as those quantities are scaled to the mean affective intensity, one may also compare those values between several movies (see Figure 3). D. Relevant Affective Parts However, as soon as N is small, the strong inter-subject variability (see Figure 3) makes the interpretation of the quantity pn more tricky. Indeed, the values of pn may be not statistically significant. A statistical strategy is thus adopted to discriminate between the background noise (no or very little affective reaction) and the relevant information of the MAP. More precisely, each time sample k is considered as a random variable Pk associated to the latent unknown distribution of pin [k], 1 ≤ i ≤ N . The 10% of variables with the lowest mean are considered as the background noise (other approaches based on video meta-data may be also adopted). Then, for

every random variable Pk the mean p-value ek is computed by averaging the p-value of the bilateral Mann-Whitney-Wilcoxon test performed between the variable Pk and each variables of the background noise (null hypothesis: the distributions of both groups are equal). Each Pk with an associated ek value lower than 5% is considered as significantly different from the background noise and is added to the “Relevant Affective Parts” (RAP).

III.

E VALUATION IN T HEATER

To validate the proposed framework and for the first time in such a context, experiences were conducted on a real audience during regular cinema shows but also during special shows in the context of the film festival, “Festival du Film Britannique de Dinard”. More precisely, for two french movies proposed at the Festival de Cannes (2012), namely, “De rouille et d’os” from Jacques Audiard (movie #1 - drama involving a handicapped young girl) and “Le grand soir” from Benoˆıt Del´epine and Gustave Kervern (movie #2 - light comedy with an off-kilter humor), the EDAs of 3 × 14 audience members were recorded during three different regular cinema shows in a partner theater named “Le S´evign´e” (Cesson-S´evign´e, France). People were offered a free ticket for their participation and the ages as well as the genders of the audience (42 audience members per movie) were reasonably balanced. In a very similar way, some other experiments were also conducted during the film festival “Festival du Film Britannique de Dinard” in partnership with the organization committee (Dinard, France). For three other movies, namely, “Now is good” from Ol Parker (movie #3 - upsetting drama involving a young girl with cancer), “Life in a day” from Kevin MacDonald (movie #4 - documentary aggregating real life records of multiple participants) and “I, Anna” from Barnabe Southcombe (movie #5 - absorbing thriller), the EDAs of respectively 24, 12 and 12 persons were also recorded with quite balanced ratios in terms of ages and genders. In both capture sessions, each subject was free to select its own position in the room (no control of the placement) which temperature was regulated by the air-conditioning system of the theater which theoretically prevent from big temperature variations. A consumer BodyMedia Armband sensor [13], [7] was placed on the palm area of each participant (Figure 2) to record the skin conductance at a 32Hz rate. The entire recording process was synchronized and controlled and the EDA was directly stored on a memory embedded on each sensor (no wireless transmission). The sensor placed on the fingers was very well accepted by the users who reported not to have been disturbed by its presence during the film. At the end of each show, a simple questionnaire was submitted to the participant. They were especially asked to report the different highlights in the movie they considered as the most relevant in terms of emotional impact. From those post-hoc questionnaires combined with the a real-time vocal annotation (in the projectionist space) of the movie made by three different persons who have already watched the movies twice, a “ground truth” of the different highlights of each film was built up.

Ali is meeting his sister again emotional scene

Stéphanie’s Accident shocking scene

Ali is shaking his son violently The dog is put in shocking kennels and the son is scene extremely sad shocking scene

Scene of violent freefight shocking scene

Love scene with sport coach love scene

First love scene between Ali and Stéphanie (« opé ») love scene

Moving scene between Stéphanie and the son emotional scene

Son’s drowning shocking scene

Sister’s layoff shocking scene

Fig. 3. Affective benchmarking solution applied to movies #1. (top) Box plot of every “Individual Affective Profiles”. Each box represents the distribution of the IAPs for every users at a given time. The central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. (middle) Mean p-value of the bilateral Mann-Whitney-Wilcoxon test. Time samples with a p-values under the red line (p = 0.05) are considered as “Relevant Affective Parts”. (down) “Mean Affective Profile”. Each bar represents the mean arousal of the audience at a given time sample. Red horizontal lines are drawn above “Relevant Affective Parts”. The different highlights in the movie considered by the audience as the most relevant in terms of emotional impact are also superimposed.

Fig. 2.

Bodymedia sensor placed on the fingers.

IV.

R ESULTS AND D ISCUSSION

The IAPs, MAPs and RAPs were respectively computed for the five different movies and results are proposed in Figures 3, 4 and 5. For all movies, one can observe a Mean Affective Profile with variations and peaks quite well synchronized with the highlights identified from the audience and the vocal annotation. Those peaks are significantly different from the “background affective noise” as it can be observed on the associated p-

value traces (Figure 3), and their amplitudes are consistent with the intensities of the events. It is especially the case of the “drowning event” of the movie #1 but also of the ends of movies #3 and #5 which were reported as very shocking, emotional or thrilling highlights by all audience members. Those events have consistently a very high mean value and are also identified as RAPs. On the contrary, non-highlight parts have a lower mean value and are generally not identified as RAPs. Besides punctual analysis, the MAPs also allow to analyze the temporal and dynamics aspects of the emotional arousal during each movies. One can first observe how the values of the different MAPs tend to increase during the movie. It is especially the case of movies #1, #3 and #5 where a progressive increase of the emotion strain until a last ultimate event can be notices. Those quantitative observations are consistent with the qualitative nature of the movie creation: it is often the goal of a creator to make the audience more and more involved in the movie story and the computed MAPs allow a quantitative verification of this intent. In line with this latter observation, one can also notice a quite high peak on almost each movie around the second 1500. This peak may also illustrate a cinematographic technique well-

Ali is meeting his sister again emotional scene

Mother’s birthday funny scene

Stéphanie’s Accident shocking scene

« Not » is fidgeting in front of people eating at restaurant funny scene

The dog is put in kennels and the son is extremely sad shocking scene

The mother is pushing her strange trolley with the baby inside funny scene

Ali is shaking his son violently shocking scene

Scene of violent freefight shocking scene

« Not » is Jean-Pierre is trying to make discovering with his brother his banker that he hired has been ruined funny scene by his wife funny scene

Love scene with sport coach love scene

First love scene between Ali and Stéphanie (« opé ») love scene

The parking attendant is discussing with the father funny scene

Moving scene between Stéphanie and the son emotional scene

Scene with the hanged person funny scene

Sister’s layoff shocking scene

Son’s drowning shocking scene

Jean-Pierre is trying to convince a former colleague to take part in his « revolution » funny scene

Fig. 4. “Mean Affective Profiles” computed during regular movie shows. Movies #1 (top) and #2 (bottom). Each bar represents the mean arousal of the audience at a given time sample. Red horizontal lines are drawn above “Relevant Affective Parts”. The different highlights in the movie considered by the audience as the most relevant in terms of emotional impact are also superimposed.

known by the directors which consists in “waking-up” the audience after a certain duration from the beginning (around one third of the whole movie duration) to make them be “engaged into the movie”. As described before, the computed MAPs offer the possibility to quantify punctual as well as dynamic tendencies of the audience reaction during a movie. Furthermore, due to their normalized nature, they also give a way to quantitatively compare the reaction levels in between different movies. For instance, when comparing the MAPs of movies #1 to #5, it appears that movies #2 and #4 tend to have a lower mean intensity than the others. Such an observation is consistent with the global comments from the critics: movies #1, #3 and #5 are respectively upsetting dramas or absorbing thriller whereas movies #2 and #4 are respectively a light comedy with an off-kilter humor and an “experimental” documentary. Regarding the scattering of the IAPs (Figure 3), even if a quite high variability may be noticed, the behavior of the median values seems to confirm the trend already observed on the mean values and give a rough idea of the inter-subject dependency. One can however point out a higher variability for the high mean (or median) values of the IAPs. This behavior may be partly explained by the facts that: i) when nothing happens in the movie, everybody globally don’t react, whereas ii) when an event occurs in the movie, some of the audience members significantly react whereas others remain insensitive. Variability is thus higher for larger RAPs but those parts remain statistically different from the background noise according to the associated p-values. Finally, a last remark can be pointed out regarding the interest of using the EDA to quantify sustained states. In such situations, one could think that the EDA only reacts to quite strong and precise events and thus, that nothing could be noticeable

in the computed MAPs for sustained parts of the movie which are supposed to be long and slightly progressive increase of the subject’s arousal. In fact, the EDA does not react once but repeatedly and regular peaks may be observed in such situations. Those peaks are thus detectable in the positivetruncated derivative and appear in the associated MPAs as one can especially observe at the thrilling end of the movie #5 where the intensity remains high for more that 300 seconds. V.

C ONCLUSION AND F UTURE W ORKS

In this paper we have proposed a new solution to affectively benchmark a movie in terms of arousal variations. It is based on the measurement of the EDA and a dedicated processing that produces a time-variant and normalized affective signal. This signal, termed “Affective Profile”, is easily usable for a content creator that would be interested by studying its movie in an objective and affective manner or for studio and distributors interested by marketing analysis and commercial predictions. The framework has been carried out, for the first time in such a context, on a real audience during regular cinema shows and film festival, for five different movies and a total of 128 audience members. The obtained results in such an ecological context are consistent with the explicit reporting of the participants and prove that such a solution may offer new possibilities to the content creators and distributors to analyze and compare their movies in an affective way. A deeper validation on more movies has now to be done as well as a refinement of the analysis on sub-categories of the audience (by gender, age, ...). Correlation in between audience member reactions could be indeed interesting paths to investigate. Furthermore, in this paper, the questions of when and how much an audience reacts have been addressed but, the problem of how this same audience reacts (positively or

Movie #3

First kiss with wig shoking

Mushroom consumption funny Catheter removal shoking

Adam is discovering Tess's cancer and is seeking her in the forest emotional

Arrested in the supermarket thrilling

Motorbiking with Adam thrilling

Adam is introduced to the father + Abortion confusing

On the beach with Adam romantic

Yes/No with mother in hospital emotional Tess's nose is bleeding shoking

Tess's blackout and hospitalization shoking

Ice-skating + No abortion thrilling

Discovery of Tess' tags thrilling

Adam is jumping through the window + love scene romantic

Tess's dad is very sad and is crying emotional

Tess's death description emotional

Robery in the supermarket thrilling

Blackout close to the fire shoking

Movie #4

Discussion with a drunk guy funny

A guy is eating an egg with a chick inside disgusting

A gay is annuncing his homosexuality to his grandmother emotional Cow in the abattoir disgusting

A funny priest is re-marrying a couple funny

A urban climber is climbing buildings and a bus impressive

Lighting and flying balls are launched in the sky emotional

Movie #5

The policeman is chasing Anna close to the phone box thrilling Discovering of the scene where the crime happened thrilling

The policeman is chasing Anna in his car during the night thrilling

Anna is walking very fast with the Unvolontary stroller and is speed-dating + pushing an First meeting empty swing between Anna troubling and the policeman George's son is romantic chased by the police thrilling

Call between Anna and the policeman during the night romantic

Anna is lying to the policeman about their first The policeman is meeting discovering the troubling empty room thrilling Sexual scene between Anna and Georges + Anna is upset in the elevator thrilling

The baby has a crash thrilling

Suicide attempt thrilling

Anna with her knife in the kitchen thrilling The police is trying to arrest Anna thrilling

First speeddating schemer

Fig. 5. “Mean Affective Profiles” computed during the “Festival du Film Britannique de Dinard”. Movies #3 (top), #4 (middle) and #5 (bottom). Each bar represents the mean arousal of the audience at a given time sample. Red horizontal lines are drawn above “Relevant Affective Parts”. The different highlights in the movie considered by the audience as the most relevant in terms of emotional impact are also superimposed.

negatively) has been eluded. This question of the valence of the reactions is thus another interesting research direction for the future that we will have to consider.

[7]

[8]

R EFERENCES [1]

M. Pantic and L. Rothkrantz, “Automatic Analysis of Facial Expressions: The State of the Art,” IEEE PAMI, vol. 22, no. 12, pp. 1424– 1445, 2000.

[2]

P. Lang, “The emotion probe,” Am. Psychol., vol. 50, no. 5, pp. 372– 385, 1995.

[3]

R. Calvo and S. D’Mello, “Affect Detection: An Interdisciplinary Review of Models, Methods, and their Applications,” IEEE Trans. on Affective Computing, vol. 1, no. 1, pp. 18–37, 2010.

[4]

R. Matthews, N. McDonald, P. Hervieux, P. Turner, and M. Steindorf, “A Wearable Physiological Sensor Suite for Unobtrusive Monitoring of Physiological and Cognitive State,” in Conf. Proc. IEEE EMBS, 2007, pp. 5276–5281.

[5]

R. Fletcher, K. Dobson, M. Goodwin, H. Eydgahi, O. Wilder-Smith, D. Fernholz, Y. Kuboyama, E. Hedman, M. Poh, and R. Picard, “iCalm: Wearable Sensor and Network Architecture for Wirelessly Communicating and Logging Autonomic Activity,” IEEE Trans. Inf. Technol. Biomed., vol. 14, no. 2, 2010.

[6]

H. Kleinkopf, A Pilot Study of Galvanic Skin Response to Motion Picture Violence. Texas Tech University, 1975.

[9]

[10]

[11] [12] [13]

S. Rothwell, B. Lehane, C. Chan, A. Smeaton, N. OConnor, G. Jones, and D. Diamond, “The CDVPlex biometric cinema: sensing physiological responses to emotional stimuli in film,” in ICPA. Citeseer, 2006, pp. 1–4. M. Soleymani, M. Pantic, and T. Pun, “Multi-modal emotion recognition in response to videos,” IEEE Trans. on Affective Computing, vol. 3, no. 2, pp. 211–223, 2011. L. Canini, S. Gilroy, M. Cavazza, R. Leonardi, and S. Benini, “Users’ response to affective film content: A narrative perspective,” vol. 1, no. 1, 2010, pp. 1–6. J. Fleureau, P. Guillotel, and Q. Huynh-Thu, “Physiological-based affect event detector for entertainment video applications,” IEEE Trans. on Affective Computing, vol. 99, no. PrePrints, 2012. J. Russell, “A circumplex model of affect,” J. Pers. Soc. Psychol., vol. 39, no. 6, pp. 1161–1178, 1980. W. Boucsein, Electrodermal Activity (Second Edition). Springer, 2012. J. Jakicic, M. Marcus, K. Gallagher, C. Randall, E. Thomas, F. Goss, and R. Robertson, “Evaluation of the sensewear pro armband to assess energy expenditure during exercise,” Med. Sci. Sports Exerc., vol. 36, no. 5, pp. 897–904, 2004.