MOCO'15 Conference Proceedings Format

ent performances show some variability that can be observed ..... 600. Time20. Time10. Time1. Figure 10. Time Effort descriptor computed on the entire body, ...

Télécharger le PDF

520KB taille 1 téléchargements 401 vues

commentaire

Report

A Review of Computable Expressive Descriptors of Human Motion Caroline Larboulette University of South Brittany Vannes, France [email protected] ABSTRACT

In this paper we present a review of computable descriptors of human motion. We first present low-level descriptors that compute quantities directly from the raw motion data. We then present higher level descriptors that use low-level ones to compute boolean, single value or continuous quantities that can be interpreted, automatically or manually, to qualify the meaning, style or expressiveness of a motion. We provide formulas inspired from the state of the art that can be applied to 3D motion capture data. Author Keywords

Expressive descriptors, human motion, motion capture, movement quality ACM Classification Keywords

I.4.8 Image Processing and Computer Vision: Scene Analysis: Motion INTRODUCTION

Human motion research is an increasingly active field covering a broad range of areas including motion analysis, motion recognition and motion synthesis. Therefore it contributes to a wide range of applications such as Human-Computer Interaction, ergonomics, but also film and game industries. In recent years a large amount of motion capture data has become freely available providing rich material for the study of movement. However, these databases contain motions as various as locomotion, sport movements, theatrical gestures or everyday life movements. Very little motion examples allow to study expressive movements, i.e. movements conveying sense and expressiveness that reflect not only the intention of the captured subject, but also their state of mind and their emotion. Human movements can be seen as a combination of multiple elements which intrinsically associate meaning, style, and expressiveness. The meaning can be characterized by a set of signs that can be linguistic elements or significant actions. This is the case when expressing narrative scenarios in theatrical situations [7], or constructing novel utterances in sign Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MOCO’15, August 14–15, 2014, Vancouver, Canada. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-3457-0/15/08...$15.00 DOI: http://dx.doi.org/10.1145/2790994.2790998

Sylvie Gibet University of South Brittany Vannes, France [email protected] languages. The style includes the identity of the subject, determined by the morphology of the skeleton, the gender, the personality and the way the motion is performed, according to some specific task (e.g. moving in a graceful or moving in a jerky way). The expressiveness characterizes all the nuances that are superimposed on motion, guided by the emotional context of the action or with a willful intent. Most of the time, it is very difficult to separate all these components. The different performances show some variability that can be observed in the raw motion data. These modulations may subsequently be identified and quantified to characterize movements. There is no consensus on which descriptors characterize best motion quantities and qualities. Early work that has attempted to describe and categorize movements and gestures are linguistic or psychological studies. For gestures conveying a specific meaning, called semiotic gestures, taxonomies have been proposed. They define semantic categories, i.e. classes that can be described and discriminated by verbal labels [20], [24]. Movement notation systems provide efficient tools for coded descriptions of movements that characterize both structural properties and expressive qualities of the movements. They include (i) verbal labels [12], (ii) structural notations such as the Laban Movement Analysis (LMA) defined for dance choreography [22], [4], [23], or (iii) linguistic notations such as those used in sign languages [11]. LMA identifies semantic components that describe the structural, geometric and dynamic properties of human motion. This theory comprises four major components: Body, Space, Shape and Effort. Body and Space components describe how the human body moves, either within the body or in relation with the 3D space surrounding the body. The Shape component describes the shape morphology of the body during the motion, whereas the Effort component focuses on the qualitative aspects of the movement in terms of dynamics, energy and intent. LMA has been largely used in computer animation [9], [32], [28], for motion segmentation [5] and for motion indexing [3]. Other studies exist on body movements for affective expression. Two recent surveys review the literature on affect recognition and generation from body posture and movement [19], [21]. If these studies identify key contributions for computational models of motion, there is still a lack of information for consistent and quantifiable set of motion measures usable by the motion research community. However, there has been a few attempts to propose computable measures of movements. Some of them compute descriptors inspired by LMA and have resulted in the EyesWeb Expressive Gesture Processing Library [6]. Using the

EyesWeb system, 2D quantitative features based on body movement (quantity of movement and contraction index of the body) and gesture expressiveness (velocity, acceleration and fluidity of the hand movement) have been extracted from videos, the expressive cues being used to discriminate emotions [8]. Extending the EyesWeb library, Glowinski et al. [16] have extracted expressive features inspired from [29] for the head and both hands. Pelachaud [26] has defined perceivable expressiveness in motion, characterized by six dimensions, which have been designed for communicative behaviors and are applied to the animation of a virtual character. In the same line of research, movement qualities have been defined for specific dance characterization from 2D video capture [2], [1]. In all these studies, the data is extracted from sensors and capture devices that are rather imprecise and lowdimensional. Indeed, tracking body positions from video do not provide the high-dimensional time series data that is required to capture and analyze the whole 3D body motion and to understand very precise gestures. Furthermore rapid gestures have to be captured at high frequency rates to be able to analyze fine details of the movement, i.e. small and rapid variations characterizing expressiveness. Therefore motion capture databases have started to be used to extract significant expressive motion features. Dealing with large mocap databases, Müller et al. [25] have defined various geometric features describing geometric relationships between specific points of the skeleton for given key-frames. The binary features are used for content-based motion retrieval. Exploiting full-body motion capture, sets of computational features based on LMA theory have been proposed [17], [18]. These features are quantified as boolean values associated to the time frames at which the motion qualities start and end. In contrast, Sadamani et al. [27] implement the Effort and Shape descriptors for hand-arm movements as continuous functions and evaluate them by a correlation analysis with manual annotations. In this paper, we review the main quantitative motion descriptors characterizing the expressiveness of full-body motion data. We do not pretend to provide an exhaustive set of descriptors, but we describe and illustrate descriptors that have been used for various computational studies on motion. Extending the works based on Laban Effort quantifications [18], [27], we define low-level motion features computed from raw motion data, and formalize and categorize the high-level descriptors that can be found in the state of the art. These descriptors may reveal different properties of human motion and some of them can also be automatically or manually interpreted (from perception or notation systems). They correspond therefore to different levels of description, from signal-based levels to symbolic-based levels, the level of interpretation depending on the knowledge of the movements.

• a set of m joint positions: x(ti ) = {x1 , x2 , ..., xm }(ti )

where x are the relative positions associated to the kth joint with 1 ≤ k ≤ m; • a set of m joint angles: x(ti ) = {q1 , q2 , ..., qm }(ti )

A movement is traditionally represented by a sequence of joint configurations of the skeleton over a period of time. According to the goal of the application, each configuration at a time ti may be defined by:

(2)

where qk are the relative rotations associated to the kth joint with 1 ≤ k ≤ m. Each pose x(ti ) is encoded by a 3 × m dimensional vector for position representation or a 4 × m dimensional vector for quaternionic rotation representation. A motion can thus be represented as a sequence of n poses X = {x(t1 ), x(t2 ), ..., x(tn )} of dimensionality 3|4×m×n. The motion can be sampled at regular intervals of time, or at irregular intervals if we consider the extraction of key-frames that represent the most relevant frames in a motion clip [30]. We propose in the following sections a classification of the motion descriptors according to an increased level of interpretation, from low-level descriptors to high-level descriptors. These descriptors can be defined at various temporal and spatial resolutions: the temporal resolution may be limited to a single frame or may cover a sequence of frames; the spatial resolution may represent one joint (e.g., end-effector joints), a set of joints (e.g., the hand-arm system) or the whole body (postures). LOW-LEVEL MOTION DESCRIPTORS

Low-level motion descriptors are either kinematic or dynamic quantities characterizing the evolution of the motion trajectories over time, or geometrical features characterizing one skeleton posture (or pose) at a given time or over a short period of time. Kinematic / Dynamic Descriptors

Kinematic and dynamic quantities can be directly derived from the motion representation. For example, motion velocity, acceleration, jerk and curvature define motion trajectories that directly express properties of the motion. In the remainder of the paper, we will use bold characters or words to represent 3D vectorial quantities and non-bold characters or words to represent scalar values of the corresponding quantities (i.e. their norm). • Duration: records the total lasting time of the movement. • Velocity: computes the instantaneous velocity for one joint k (i.e. the rate of change of its position): xk (ti+1 ) − xk (ti−1 ) (3) 2δt The speed of one joint k (scalar absolute value) is defined as the magnitude of its velocity: q v k (ti ) = vxk (ti )2 + vyk (ti )2 + vzk (ti )2 (4) vk (ti ) =

MOTION REPRESENTATION

(1)

k

• Acceleration: computes the instantaneous acceleration for one joint k. It can be estimated by the following equation: k

k

k

x (ti+1 ) − 2.x (ti ) + x (ti−1 ) (5) δt2 The scalar value is computed as the norm of the acceleration vector: q ak (ti ) = akx (ti )2 + aky (ti )2 + akz (ti )2 (6) ak (ti ) =

Figures 1, 2 and 3 show the acceleration curves in each dimension x, y, z as well as the norm for a 300 frames motion for 3 joints: the root, the left hand and the right hand. 20 15 10 5 0 0 -5 -10 -15 -20

Acc X

Acc Y

Acc Z

Acc (scalar)

xk (ti+2 ) − 2.xk (ti+1 ) + 2.xk (ti−1 ) − xk (ti−2 ) 2.δt3 (7) The scalar value is computed as the norm of the jerk vector: q j k (ti ) = jkx (ti )2 + jky (ti )2 + jkz (ti )2 (8)

jk (ti ) =

• Curvature: measures how fast a curve is changing direction at a given point. For a joint k, it can be computed by the cross product of its acceleration by its velocity: C k (ti ) =

kak (ti ) × vk (ti )k v k (ti )3

(9)

The radius of curvature Rk (ti ) can be computed as Rk (ti ) = 1/C k (ti ) 50

100

150

200

250

300

Figure 1. Acceleration aroot (ti ), aroot (ti ), aroot (ti ) and scalar x y z aroot (ti ) of the Root joint (Pelvis).

60 50 40 30 20 10 0 0 -10 -20 -30 -40

For one joint k, it can be computed by:

Acc X

Acc Y

50

100

Acc Z

150

Acc (scalar)

200

250

• Quantity of Motion (QoM): is a weighted average of the speeds of a set K of representative joints in the body. It can be expressed by the following equation: P wk .v k (ti ) P qom(ti ) = k∈K (10) k∈K wk Geometric Descriptors

Some spatial features are used to characterize geometrical aspects of the body, relatively to itself or to the environment. They are directly computed from the positions and orientations of the joints. Examples are bounding shapes (see figure 4), Euclidean distances or joint angles.

300

lh lh lh Figure 2. Acceleration alh x (ti ), ay (ti ), az (ti ) and scalar a (ti ) of the Left Hand joint.

60 50 40 30 20 10 0 -10 0 -20 -30 -40 -50 -60

Acc X

Acc Y

Acc Z

Acc (scalar)

Figure 4. Bounding Shapes: left- bounding box; right- bounding sphere centered on the Root joint.

50

100

150

200

250

300

rh rh rh Figure 3. Acceleration arh x (ti ), ay (ti ), az (ti ) and scalar a (ti ) of the Right Hand joint.

• Jerk: represents the rate of change of the movement acceleration. This feature describes the movement smoothness.

• Bounding Box: computes the bounding box (−x, +x, −y, +y, −z, +z) in 3D of the body, i.e. the rectangular parallelepiped enclosing the body. • Bounding Sphere: computes the bounding sphere (center, radius) of the body. The sphere might be centered on the root or not. • Bounding Ellipsoid: computes the bounding ellipsoid (center, ra , rb , rc ) of the body. It is tighter than the bounding sphere, yielding to a better volume approximation, but is more expensive to compute.

• Convex Hull: computes the convex hull of the body, i.e. the smallest convex set that contains all joints, which gives a tighter bounding shape but is also more expensive to compute [17]. • Displacement: computes the distance of a joint or effector k relative to a position l that may be the root of its limb, the center of mass or the ground (e.g., projection of the root). For example, [18] computes the displacement of root-feet extremities, hand-shoulder, head-root; [13] computes the distances between both hands and between both feet.

Since most research use a classification according to LMA, we have also chosen this classification to present the descriptors related to the body, the body shape, the space where the body evolves and the effort expressive descriptors. Other descriptors based on other theories such as Gallaher [14] are described in an additional subsection at the end. Body Descriptors

(11)

The Body component describes the spatial characteristics of the motion, determines which body parts are moving. The balance, support and center of mass displacement are also contained in this component.

• Rotation: computes the angular displacement v that brings the orientation qk of a joint k to the orientation ql of a joint l:

• Action Presence: detects if the displacement of the body segment between successive frames crosses a threshold (computed on a joint k) [18]:

Displacementk (ti ) = kxk (ti ) − xl (ti )k

exp vkl (ti ) = qk (ti ).ql (ti )−1

(12)

• Center of Mass (CoM): computes a weighted average of the positions of representative joints in the body. It can be expressed by the following equation [18]: P wk .xk (ti ) P com(ti ) = k∈K (13) k∈K wk The center of mass can be computed for a limb or for the entire body. A set of joints must be chosen as well as appropriate normalized weights wk for each joint. Figure 5 shows the 3D position of the CoM for an example of movement. We have chosen anthropometric values inspired by Dempster [10]: we used the root (49.7%), each shoulder (2.8%), each elbow (1.6%), each hand (0.6%), each thigh (10%), each knee (4.65%), each foot (1.45%) and the head (8.1%). Note that a more accurate CoM computation technique may be used or a different set of joints / weights.

Actionk (ti ) = ||xk (ti ) − xk (ti−1 )|| > ? 1 : 0

(14)

• Center of Mass Displacement: computes the displacement of the center of mass of the body from its rest position [18]: com-disp(ti ) = ||com(ti ) − com(trest )||

(15)

• Balance: is a boolean value that indicates the relative location of the CoM with respect to the support polygon [18]. After computing the bounding shape, we determine if the projection of the CoM on the ground lies into the area defined by the projection of the bounding shape: Balance(ti ) = proj(com(ti )) ∈ proj(B-Shape(xk (ti ))) ? 1 : 0

(16)

• Support: identifies the current body segment which is being used to support the body weight and is the body part which is in contact with the ground [18]. Space Descriptors

The Space component describes the movement in relation with the 3D space surrounding the body. This corresponds to the description of pathways, of the space where the action takes place or of spatial patterns of the movement. • Distance Covered: measures the total distance covered by the projection of the Root joint on the floor over a time period [3].

Figure 5. 3D position of the Center of Mass com(ti ).

HIGH-LEVEL MOTION DESCRIPTORS

High-level descriptors use low-level features to calculate kinematic, spatial or physical quantities that can give rise to an interpretation by movement experts or perceptual evaluations and are optionally characterized by semantic labels or discrete variables. We have expressed those descriptors as continuous features in time [27]. The calculations can be made at various levels of spatial discretization (one joint k, a set of joints or full-body poses [17]) or temporal discretization (each frame, a set of key-frames or a movement chunk).

• Area Covered: measures the area covered by the projection of the Root joint on the floor over a time period [3]. • Hip Height: calculates the distance between the Root joint and the ground. It gives information on the performer whether it kneels, jumps or falls [3]. Shape Descriptors

The Shape component describes how the body changes shape during movement. It may be characterized by the deformation of the skeleton’s bounding shape over time or by the relative distances between effectors giving indication on the configuration of the body. In LMA, it is composed of 3 subcomponents: Shape Flow, Shape Directional and Shaping.

• Bounding Volume (Shape Flow): computes the volume of the bounding shape in 3D of the body. Several options are possible to enclose the body in a bounding shape: box, sphere, ellipsoid or convex hull [17]. k

k

ShapeV (ti ) = V olume(B-Shape(x (ti )))

Shape20

80000 60000

The Bounding Volume descriptor can be computed on the entire body or on body parts. An example of the volume variation of an animation is shown on figure 6.

40000

8 7 6 5 4 3 2 1 0

Shape10

100000

(17)

B-Box

Shape1

120000

20000 0

B-Sphere

0

50

100

150

200

250

300

Figure 7. Shape Directional descriptor computed on the entire body, with weights αk set to 1 and time windows of 1, 10 and 20 frames.

Alaoui et al. [1] define the notion of Verticality that measures the ratio of the height on width of the body silhouette. • Extensiveness: computes the maximum distance between the center of mass and the set of end extremities of the body (hands, feet, shoulders, head) [1]: 0

50

100

150

200

250

300

Figure 6. Bounding Volume descriptor, computed on the entire body, with a bounding box and a bounding sphere centered on the root.

To express the Shape feature, Fourati et al. [13] define a Directional Changes descriptor which uses a 2D bounding box surrounding the body in each of the three main directions, and then computes the mean of the global postural changes along these dimensions. • Shape Directional: determines the shape of the path along which a movement is executed. Two opposing dimensions are considered: Arc-like and Spoke-like. As this descriptor closely associates geometrical and kinematic properties of the motion, one way to compute it is to calculate the average curvature of the movement in a 2D plane obtained through PCA analysis [27]. A similar measure can be computed in 3D. For the k th part of the body and a movement of length T we have: ShapeDk (T ) =

T 1X k C (ti ) T i=1

and for a segment of p joints (with normalized αk ): X ShapeD(T ) = αk .ShapeDk

(18)

Extens.m (ti ) = max αk .kcom(ti ) − xk (ti )k

(20)

The weighted sum can also be considered: X Extens.w (ti ) = αk .kcom(ti ) − xk (ti )k

(21)

k

k∈K

• Arms Shape: measures the distance between the handelbow arm system and the body center [13]. • Elbow Flexion: computes the local minima (< 60◦ ) and maxima (> 60◦ ) of the elbow flexion angles [13]. • Shoulder Angle: computes the angle between the axis defined by the arm and the vertical axis of the torso [1]. • Hands Relationship: computes the 3D distance between hands (mean and variation) [13]. See figure 8. • Feet Relationship: computes the 3D distance between feet. It is used to estimate the mean of stride length (i.e. openness), and the mean of feet postural changes (i.e. step duration) [13] [1]. See figure 8. Hands R.

2,0

(19)

Feet R.

1,5

k∈K

1,0

A computation of the Shape Directional descriptor for various time intervals is shown on figure 7. • Shaping: characterizes the way the shape of the body evolves during the movement and occupies the space along three directions: vertical, front-back and left-right. It is qualified by six elements: Lengthening/Shortening, Widening/Narrowing and Bulging/Hollowing. The movement of the body is analyzed by projecting it onto 3 planes: frontal, horizontal and sagittal. Fourati et al. [13] provide 2 descriptors: the Global Body Shape refers to the global openness of the whole body in the three planes, and the Local Body Shape characterizes the shape of a set of segments.

0,5 0,0

0

50

100

150

200

250

300

Figure 8. Hands Relationship and Feet Relationship.

Effort Descriptors

The Effort component focuses on the quality of motion in terms of dynamics, energy and expressiveness. It comprises four sub-categories (Weight, Time, Space and Flow) which vary continuously in intensity between opposing poles.

• Weight Effort: refers to physical properties of the motion, the two opposing weights in motion being Strong (powerful, forceful) or Light (gentle, delicate, sensitive). According to [17], it is estimated by computing the sum of the kinetic energy of the joints composing the body part. The Weight Effort descriptor is then extracted by estimating the maximum energy over a time interval. For each joint k, E k (ti ) = v k (ti )2 , and for p joints composing an effector, we have: X X 2 E(ti ) = E k (ti ) = αk .v k (ti ) (22) k∈K

(23)

In this formulation, the motion characterized by a high Weight feature is Strong, whereas the motion with a low Weight feature is Light. αk are the normalized weights associated to each joint k (depending on biomechanical properties). In [17], the Weight is computed using 5 joints (root, left-finger, right-finger, left-toe and right-toe), the root, the fingers and the toes having different αk values. The Weight can also be characterized by the deceleration of motion over a time interval [18] (Strong: large deceleration, Light: little or no deceleration), the peak of the kinetic energy providing a more precise measure as it represents the maximum absorption of the kinetic energy [27]. A computation of the Weight Effort descriptor for various time intervals is shown on figure 9. Weight1

Weight10

Weight20

A computation of the Time Effort descriptor for various time intervals is shown on figure 10. Time1

600

Time10

Time20

500

300 200 100 0

0

50

100

150

200

250

300

Figure 10. Time Effort descriptor computed on the entire body, with weights αk set to 1 and time windows of 1, 10 and 20 frames.

• Space Effort: defines the directness of the movement which is related to the attention to the surroundings. Two opposite poles are considered: Direct (focused and toward a particular spot) and Indirect (multi-focused and flexible). We use here the quantification defined in [27], which itself is directly inspired from [18]. For the k th joint of the body, and a movement of length T this is computed as: PT kxk (ti ) − xk (ti−1 )k Spacek (T ) = i=2 k (26) kx (T ) − xk (t1 )k and for a segment of p joints: X Space(T ) = αk .Spacek (T )

80 60

(27)

k∈K

40

A computation of the Space Effort descriptor for various time intervals is shown on figure 11. Note that this measure has no sense for a very small time interval (i.e. frame to frame, time window of 2).

20 0

(25)

k∈K

400

k∈K

W eight(T ) = max E(ti ), i ∈ [1, T ]

100

and for a segment of p joints: X T ime(T ) = αk .T imek (T )

0

50

100

150

200

250

300

Figure 9. Weight Effort descriptor computed on the entire body, with weights αk set to 1 and time windows of 1, 10 and 20 frames.

Space2

100

Space10

Space20

Space50

80

• Time Effort: represents the sense of urgency. Two opposing dimensions have been defined to represent this notion: Sudden (urgent, quick) and Sustained (stretching the time, steady). The Time feature can be estimated by calculating the sum of the accelerations over time for a body part, either using directly the sum of the accelerations of representative joints (root, fingers and toes) over a time interval [17], or using the sum of the absolute values of the accelerations [27], [18]. For the k th joint of the body and a movement of length T this is computed as: T imek (T ) =

T 1X k a (ti ) T i=1

(24)

60 40 20

0

50

100

150

200

250

300

Figure 11. Space Effort descriptor computed on the entire body, with weights αk set to 1 and time windows of 2, 10, 20 and 50 frames.

Hachimura et al. [17] propose a different measure that captures the relationship between the movement of the face and the body movement. It is obtained through an inner

product of the face direction (defined by the normal vector of the triangle formed by the head and both shoulders) and the root velocity: Space(ti ) = xfxace (ti ).vxroot (ti )+ xfy ace (ti ).vyroot (ti )+

(28)

xfz ace (ti ).vzroot (ti ) Aristidou et al. [3] propose a similar idea but use different angles. • Flow Effort: defines the continuity of the movement. The two opposing dimensions are Free (fluid, released) and Bound (controlled, careful and restrained). It is computed as the aggregated jerk over time [18]. For the k th part of the body, and a movement of length T it is computed as: F lowk (T ) =

T 1X k j (ti ) T i=1

(29)

and for a segment of p joints: X F low(T ) = αk .F lowk (T )

(30)

A computation of the Flow Effort descriptor for various time intervals is shown on figure 12. Flow1

Flow10

8000 6000 4000 2000 0

50

100

150

200

We describe hereafter the descriptors that have not been defined previously: • Periodicity: is computed as the mean of the coefficient of auto-correlation (i.e. the correlation between a signal and its values at different points in time) on each extremity of the 4 sub-bounding boxes [1].

• Movement Phase: computes the product of the speed with the curvature that is useful to segment 3D hand movements [15]: M ovementP hasehand (ti ) = v hand (ti ).C hand (ti ) (31) DISCUSSION AND CONCLUSION

Flow20

10000

0

Alaoui et al. [1] distinguish (i) spatial descriptors including Verticality (one characteristic of Shaping), Extension (similar to Extensiveness), Legs Opening (included into Extensiveness), Weight Transfer (an approximation of Support), and (ii) temporal descriptors including Periodicity, Increase/Decrease, and Motion Quantity (related to QoM).

• Repetition: computes the repetition of a stroke of a movement [26].

k∈K

12000

and the two hands. These features include (i) Energy characterizing the quantity of motion and kinetic energy (related to QoM and Weight Effort), (ii) Spatial Extent and Smoothness/Jerkiness (related to Jerk), (iii) Symmetry and Forward/Backward leaning of the head for characterizing emotion in a video corpus of portrayed emotions.

250

300

Figure 12. Flow Effort descriptor computed on the entire body, with weights αk set to 1 and time windows of 1, 10 and 20 frames.

Other Descriptors

Some descriptors are not based on LMA and are used to compute a different kind of expressiveness such as style. Wallbott [29] proposes movement quality judgments that include Movement Activity, Expansiveness/Spatial Extension, and Movement Dynamics/Energy/Power. Inspired by [29] and [14], Pelachaud [26] has defined several expressiveness dimensions, including (i) Spatial Extent (related to Extensiveness), (ii) Temporal Extent (related to Time Effort/Velocity) applied to arms and head, (iii) Fluidity (related to Jerk (smoothness)), (iv) Power (related to Weight Effort), (v) Overall Activation (related to QoM), and (vi) Repetition characterizing the repetition of a movement’s stroke. Similarly, Glowinski et al. [16] have extended the EyesWeb library by extracting new computational features for the head

Most important motion descriptors used in the literature are presented in this paper. Most of them are re-formalized in order to give a unified view. In previous work they are computed in various ways depending on the choice of sensors and capturing techniques. They may be extracted from 2D or 3D data, captured through computer vision techniques or motion capture, with varying conditions of precision, frequency rate and noise. Therefore it is very difficult to compare the techniques and computational measures. Furthermore, a large variety of affective body movements are studied, ranging from communicative or functional movements performed in everyday life or in the context of a specific task, to artistic movements (dance, music, theatrical movements). A major drawback of the existing work is the choice of a set of descriptors according to the observed properties of the affective movements. Most of the time, this set is manually selected, thus introducing a bias in the method, since the descriptors are not independent of the movements properties. A question that naturally arises is the ability to provide generic descriptors for the high-dimensional motion data, and to automatically extract the minimal set of descriptors that covers the studied movements with their modulations. Another closely linked challenge is to build expressive movement databases that can be exchanged, and to make possible the benchmarking of different sets of generic descriptors. For describing the expressive qualities of the movements, researchers are looking for a consistent and reduced representation of the high-dimensional motion data. This consistency and dimensional reduction are generally expressed in terms

of (i) the level of discretization, (ii) the spatial reduction of the skeleton and (iii) the temporal segmentation. The level of discretization ranges from boolean values to continuous values. These quantified values may be of arbitrary ranges. One prior challenge is to determine an optimized range of values for each descriptor, both for geometrical and kinematic / dynamic descriptors, and to evaluate the sensitivity of the results to the discretization of the descriptors. Perceptual evaluations might help to find relevant quantifications depending on the application. For spatial reduction, it is very important to define on which body part(s) the measures should be estimated. The way the body is split into sub-articulated chains or effectors for calculating the different descriptors may significantly modify the results. In addition, the weighting of the joints or body parts may also impact the results and interpretations. For example, the hand system can constitute a channel in itself which conveys more expressiveness than other body parts. Another challenge consists therefore of determining the optimal splitting of the body as well as the optimal weights associated to the body parts or joints for a given application. The motion descriptors are computed over a time window whose dimension may condition the computed descriptors (see for example figure 9). In particular the size of the window may depend on the movement speed or take into account the notion of action. When a sliding window is used, the offset also has an influence. To reduce the motion data temporally, key-frame extraction methods have been developed from motion capture sequences, using layered curve simplification [30], clustering [31] or non-uniform sampling [15]. An interesting approach to deal with the high dimensionality of motion data is to compute the descriptors in a lowdimensional representation, by using linear reduction methods such as Principal Component Analysis, or using nonlinear reduction methods such as multidimensional scaling. Some works already use these techniques for isolated descriptors (e.g. [16], [27] for Shape Directional). It could be extended more generally for larger motion databases in 3D. REFERENCES 1. Alaoui, S., Bevilacqua, F., Pascual, B., and Jacquemin, C. Dance interaction with physical model visuals based on movement qualities. International Journal of Arts and Technology (IJART) 6, 4 (2013). 2. Alaoui, S. F., Caramiaux, B., Serrano, M., and Bevilacqua, F. Movement qualities as interaction modality. In Proceedings of the Designing Interactive Systems Conference (June 2012), 761–769. 3. Aristidou, A., and Chrysanthou, Y. Feature extraction for human motion indexing of acted dance performances. In Proc. of the Intl. Conference on Computer Graphics Theory and Applications (Jan. 2014), 277–287. 4. Bartenieff, D. Effort-Shape analysis of movement: The unity of expression and functions. Arno Press Inc., New York, 1972. 5. Bouchard, D., and Badler, N. I. Semantic segmentation of motion capture using laban movement analysis. In Proc. of the Intl. Conference on Intelligent Virtual Agents (Sept. 2007), 37–44. 6. Camurri, A., Mazzarino, B., and Volpe, G. Analysis of expressive gesture: The eyesweb expressive gesture processing library. In Gesture-Based Communication in Human-Computer Interaction. 2004, 460–467. 7. Carreno-Medrano, P., Gibet, S., Larboulette, C., and Marteau, P.-F. Corpus creation and perceptual evaluation of expressive theatrical gestures. In Proceedings of the International Conference on Intelligent Virtual Agents (Aug. 2014), 109–119.

8. Castellano, G., Villalba, S. D., and Camurri, A. Recognizing human emotions from body movement and gesture dynamics. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction (Sept. 2007), 71–82. 9. Chi, D., Costa, M., Zhao, L., and Badler, N. The EMOTE model for effort and shape. In SIGGRAPH (2000), 173–182. 10. Dempster, W. T. Space requirements for the seated operator. Tech. Rep. 55-159, Wright-Patterson Air Force Base, Ohio, July 1955. 11. Duarte, K., and Gibet, S. Heterogeneous data sources for signed language analysis and synthesis: The signcom project. In Proc. of the Intl. Conference on Language Resources and Evaluation (May 2010). 12. Ekman, P., and Friesen, W. The repertoire of nonverbal behavior: Categories, origins, usage and coding. Semiotica 1:1 (1969), 49–98. 13. Fourati, N., and Pelachaud, C. Toward new expressive movement characterizations. In Proceedings of Motion in Games, Posters (2012). 14. Gallaher, P. Individual differences in nonverbal behavior: dimensions of style. 133–145. 15. Gibet, S., and Marteau, P. Analysis of human motion, based on the reduction of multidimensional captured data - application to hand gesture compression, segmentation and synthesis. In Proc. of the Intl. Conf. on Articulated Motion and Deformable Objects (2008), 72–81. 16. Glowinski, D., Dael, N., Camurri, A., Volpe, G., Mortillaro, M., and Scherer, K. R. Toward a minimal representation of affective gestures. T. Affective Computing 2, 2 (2011), 106–118. 17. Hachimura, K., Takashina, K., and Yoshimura, M. Analysis and evaluation of dancing movement based on lma. In IEEE Intl. Workshop on Robot and Human Interactive Communication (Aug. 2005). 18. Kapadia, M., Chiang, I., Thomas, T., Badler, N. I., and Jr., J. T. K. Efficient motion retrieval in large motion databases. In Symposium on Interactive 3D Graphics and Games (Mar. 2013), 19–28. 19. Karg, M., Samadani, A., Gorbet, R., Kühnlenz, K., Hoey, J., and Kulic, D. Body movements for affective expression: A survey of automatic recognition and generation. T. Affective Computing 4 (2013), 341–359. 20. Kendon, A. Gesticulation and speech two aspects of the process of utterance. In The Relation Between Verbal and Nonverbal Communication (1980), 207–227. 21. Kleinsmith, A., and Bianchi-Berthouze, N. Affective body expression perception and recognition: A survey. T. Affective Computing 4 (2013). 22. Laban, R. The Mastery of Movement. Plays, Inc., 1971. 23. Maletik, V. Body, Space, Expression: The Development of Rudolf Laban’s Movement and Dance Concepts. Mouton de Gruyte, 1987. 24. McNeill, D. Hand and Mind - What Gestures Reveal about Thought. The University of Chicago Press, Chicago, IL, 1992. 25. Müller, M., Röder, T., and Clausen, M. Efficient content-based retrieval of motion capture data. In ACM Transactions on Graphics, vol. 24:3 (2005), 677–685. 26. Pelachaud, C. Studies on gesture expressivity for a virtual agent. Speech Communication special issue in honor of Bjrn Granstrom and Rolf Carlson, 51 (2009), 630–639. 27. Samadani, A., Burton, S., Gorbet, R., and Kulic, D. Laban effort and shape analysis of affective hand and arm movements. In Humaine Association Conference on Affective Computing and Intelligent Interaction (Sept. 2013), 343–348. 28. Torresani, L., Hackney, P., and Bregler, C. Learning motion style synthesis from perceptual observations. In Advances in Neural Information Processing Systems (2006), 1393–1400. 29. Wallbott, H. Bodily expression of emotion. 879–896. 30. Xiao, J., Zhuang, Y., Yang, T., and Wu, F. An efficient keyframe extraction from motion capture data. In Proc. of the Intl. Conference on Advances in Computer Graphics (June 2006), 494–501. 31. Zhang, Q., Yu, S., Zhou, D., and Wei, X. An efficient method of key-frame extraction based on a cluster algorithm. Journal of Human Kinetics 39:1 (2013), 5–14. 32. Zhao, L., and Badler, N. Acquiring and validating motion qualities from live limb gestures. Graphical Models 67, 1 (2005), 1–16.

MOCO'15 Conference Proceedings Format

des documents recommandant