Preparation: Learning about M3D

What is the Multimodal MultiDimensional (M3D) gesture labeling system?

The Multimodal Multidimensional (M3D) gesture labeling system is a thorough and holistic annotation system for audiovisual corpora. 

Learning about M3D: What are the main properties of the gesture labeling system? 

Click here to see the transcript of the video

In this training program, we will present M3D,  which is the Multimodal Multidimensional Labeling System for the annotation of gestures that has been developed in collaboration between three gesture labs: the Language and Cognition Research Group, at UOC; the Speech Communication Group, at MIT; and the Prosodic Studies and Gesture Group, at UPF.

In this first video we will explain the core properties of M3D and the new approach it brings to gesture studies. We will explain WHY we use M3D  before going into HOW to use it to annotate gestures.

We know that spoken language is a multimodal system where multiple communication modalities come together to express meaning. These multiple modalities include the prosody of the speech, including intonation, but also the hand, face, and body gestures. Both modalities work together to express different pragmatic meanings and intentions. Thus if I say “DAVE IS COMING” I am expressing the belief that Dave will attend our meeting and I am convinced about it. By contrast, if I say “DAVE IS COMING?”, I am expressing something very different: I am expressing my disbelief or incredulity about the fact that Dave will be coming. I can also add a driving gesture to the first assertion and say “DAVE IS COMING!” to indicate that he will use  his car to come. If we look more closely, in all these examples we see facial expressions, body movements and the speech working together for an intended communicative goal. From a timing perspective, we also see how gestures tend to be time-aligned with prosodic prominence in speech, showing the profound interconnection between prosodic and gestural prominences.

What are gestures?

In M3D, we use the term “gesture” or “gestural movements” to refer to all body movements that work together with speech to convey the speaker’s communicative intent. So movements like scratching yourself or touching your hair do not usually function as gestures. The M3D annotation system has the goal of providing a holistic system of gesture annotation for audiovisual corpora that takes into account all the dimensions of gesture. Even though M3D takes all communicative body movements into account, the current training program will introduce the system by focusing on hand gestures.

What is the main contribution of M3D to gesture analysis? 

For decades, gesture researchers have largely followed one of the two main gesture classification systems that are available nowadays. First, we have the system developed by Adam Kendon in which gesture categories reflect how a gesture conveys meaning in speech. Kendon proposed a basic divide between gestures with a referential function where gestures make reference to the content of the utterance and gestures with a pragmatic function where gestures relate to features of an utterance’s meaning that are not part of its referential meaning. Second, we have the system developed by David McNeill, in which the four main categories of gesture (iconic, metaphoric, deictic, and beat) are distinguished based either on their referential status or their relationship with strong or prominent syllables in speech. For example, beat gestures are described as gestures with no semantic meaning that typically associate with prosodic prominence. M3D presents an alternative to these two categorical approaches by taking the view that gestures can be multidimensional, that is they can convey both pragmatic and referential meanings simultaneously, and at the same time can associate with prominent syllables in speech. Currently we have plenty of evidence that most gesture types, and not only beat gestures, associate with prosodic prominence in speech. Here’s some examples that show the multidimensionality of gestures. If I say a sentence like “It was JOHN, who did it, and not Mary”  we see that this pointing gesture has both a referential meaning which is the deictic component of signaling the referent but also a contrastive focus component which is conveying a pragmatic meaning of contrast. In the following video example, we will see another clear illustration of how a pointing gesture is multidimensional.  You will see how, in addition to the deixis or referential component (pointing to something earlier), the gesture has also a pragmatic meaning component, in that it helps organizing the timing structure in discourse. Let’s take a look [see example]. In this example we saw how the speaker metaphorically points to elements that occurred earlier in her discourse, so the gesture is conveying both a referential and pragmatic meaning. Further, the pointing gesture is very beat-like and associates with the prominent syllable "earlier".

These examples illustrate one of the main motivations for developing M3D. The main motive of the M3D system is to offer an annotation method that allows for a dimensionalized approach to the analysis of gestures. Thus instead of proposing a set of closed gesture categories based on the semantic or pragmatic meanings conveyed by the gesture, or its association with prosodic prominence in speech, M3D adopts a “dimensionalized” approach to gesture and allows labelers to code multiple aspects of gesture in a non mutually-exclusive way. In other words, labelers do not have to categorize a gesture based on one single aspect, like conveying iconic meaning or associating the speech prominence, but rather may “tag” multiple aspects simultaneously (gestures may or may not convey referential meaning, may or may not convey pragmatic meaning, and may or may not associate with prominence. Such an approach allows labelers to be much more flexibility in addressing different research questions, while staying true to established approaches in the field. All in all, we see that the M3D proposal bridges the approaches set forth by the two “pioneers” of gesture studies (Kendon and McNeill) and allows for a more flexible and transparent analysis of gesture. Annotating in such a way will reveal a more holistic picture of multimodal communication.

What are the main dimensions of gesture in M3D?

M3D considers three independent dimensions of gesture: the form dimension, the prosodic dimension and the meaning dimension. The form dimension describes the physical aspects of hand gesture movements like the handshape or the trajectory, that is, how the hand moves in space. The prosodic dimension of gesture describes how such gestural movements are grouped together to form gesture units, or how they are organized in time into different gestural phases. Finally, the meaning dimension describes both the referential and non-referential or pragmatic meanings a gesture may convey.

Final summary

In this video, we have presented the motivation for developing M3D, which is the need to annotate dimensions rather than categories. In sum, M3D was created to enable the coding of the different dimensions of a gesture in an independent way. The basic M3D dimensions are the form dimension, the prosodic dimension, and the meaning dimension. For more information, check out our M3D Manual and the Training Program which contains video tutorials and exercises. We hope that you enjoy learning about M3D! Thanks for watching! 

Good to know

Some of the aspects of gesture that need to be coded can be quite subjective in nature, where the labeler must interpret what the speaker is meaning to represent, say, or do with his gesture. For these reasons, it is virtually impossible to say there is only one absolute true and correct annotation for any given gesture. 


If you would like to know more about that, click on the button below. 

Useful Resources

M3D Resources (Template, Manual, M3D-TED corpus): https://osf.io/ankdx/ 

Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge University Press.

McNeill, D. (1992). Hand and mind: What gestures reveal about thought. University of Chicago press.

McNeill, D. (2005). Gesture and thought. University of Chicago Press.