Frequently Asked Questions 

Gesture Units 


Q: Can you have more than one gesture (one stroke) within a G-Unit?

A: Yes! A Gesture Unit refers to moments where gesturing is taking place (specifically any moments while the hands are not in rest), so a single Gesture Unit may contain a single gesture (stroke) or a grouping of gestures (multiple strokes). The length of a G-Unit can vary a lot.


Q: Does a Gesture Unit only end when the hands are totally relaxed by the speaker’s side? 

A: No! A Gesture Unit may end when the hands return to any position that is considered rest. You can think of this position as a “home base” where the hands go when they are not gesturing. For example, the hands may be held up in front of them or their arms can be folded across their chest. We would call this a partial rest position.


Q: The speaker returns to a rest position, but then immediately starts gesturing again. Should I consider this a single G-Unit or two separate G-Units? 

A: You should see a return to rest and perceive a complete stop in gesture production. If you do not clearly perceive this “break” in the gestural chain of events, you may wish to use the 300 ms rule. That is, when looking frame by frame, if the hands remain in rest position for at least 300 milliseconds, we suggest separating it into 2 Gesture Units. If it is less than that and the speaker does not clearly perceive a pause, then it may be annotated as one single G-Unit. 


Gesture Form 

 

Q: If the speaker is holding something in his hands, such as a pointer (e.g., in public speeches), how do I annotate the hand shape?

A: This is also about your individual objectives and preference. You may want to add an entry to the Controlled Vocabulary, for example “Hld” for “holding”. If handshape is not a key research question, alternatively you may not code it, or simply note it in a “Comment” tier. M3D is a flexible annotation system, you can make it fit your methodological choices. 


Q: How do I annotate the Trajectory Shape tier when the hands perform multiple loops?
A: If the hand performs a chain of multiple loops, you can annotate all the looping movements as one long “loop” annotation in the Trajectory Shape tier.


Q: How would I annotate a repetitive gestural up and down movement in the Trajectory Direction?

A: In the Trajectory Direction tier, we annotate the individual upward and downward movements as separate annotations. 


Q: What if the hands are moving but only because the speaker is moving his whole body (e.g., turning his torso), would you annotate the hands as moving?

A: We don’t want to take body movements, or more specifically torso turns into account. Sometimes, it can look like someone is moving the hands when only the torso is actually moving to one side. This means, when annotating hand gestures, we only want to focus on the hands. If you want to annotate torso movements or other gestures of the body, you can do it in the other tiers that are specifically created for that purpose. 


Q: How do you annotate the form dimension when both hands are doing something different?

A: We want to look at the more salient hand which we annotate on the Manual Articulator I tier. You can annotate the movements of the other, less salient hand on the Manual Articulator II tier. For more information check out the corresponding sections in the M3D labeling manual.


Q: While annotating handshape, how do I annotate when the hands are transitioning from one hand shape to another? 

A: As ELAN only allows us to annotate discrete time intervals, annotating transition movements is a bit tricky. One suggestion would be to try to find a threshold for when you want to capture the change of movement. 


Gesture Phases


Q: Would you annotate a stroke that occurs during a camera angle change as an incomplete stroke, as we cannot see the entire stroke?

A: No, incomplete strokes are abandoned strokes, not camera angle changes. 


Q: How do you annotate the phasing when both hands are doing something different?

In our experience, the hands rarely produce gestures independently of each other. Rather, they usually work together to communicate a single gestural meaning. On occasions, a speaker may be gesturing with one hand, and during the recovery of that hand, the other hand may be preparing to execute a stroke. In such cases, there are multiple options. You may decide to annotate the most important phase based on your research objectives (e.g., if you are interested in recoveries, prioritize their annotation). Alternatively you may use a comment tier. A final, least preferred option would be to create a gesture phasing tier for each hand. Again, M3D is a flexible annotation system, you can make it fit your methodological choices.


Q: If the hand moves up and down, should that be considered a single stroke, or a preparation followed by a stroke?

A: If one of the movement phases seems to be slower or less intense than the other, you would want to separate them as different gesture phases. Generally, we want to keep the duration of the stroke phase to a minimum, reflecting only the most meaningful part of the gesture. However, remember, labelers should use their perception - if the up/down movement really seems to act as a whole, then annotate the whole thing as a stroke. 


Q: If there are multiple repetitive (e.g., up/down) movements one after another, should I annotate each up/down movement as a preparation/stroke, resulting in multiple strokes, or the whole sequence as a single multidirectional stroke? 

A: We want to check the speed or intensity of the movement. When one movement (e.g., the upward movement) is slower or less intense than the other, it is probably a preparation separating multiple strokes. If you cannot clearly see a kinematic difference between the movement phases, you can annotate it as a multidirectional stroke. Remember, this can also be validated at the second pass with audio.


Q: Sometimes speakers make lists on their fingers (e.g., pointing to each finger when talking about one item in a list). Should such movements be annotated as a single, long stroke, or should they be broken up into multiple strokes (one for each item in the list) with a preparation in between? 

A: We annotate separate strokes, because the "pointing" to the different fingers also have different meanings and represent different items in the list or in the enumeration. Remember that a gesture should not cover a paragraph of speech and we want to keep the stroke phase as small as possible. The upward movement of going from one finger to the other would be annotated as a preparation between the strokes.


Q: Should the boundaries of the annotations in the gesture phasing tier align with those in the previously annotated form tiers?

A: When annotating phasing, they should generally align with some changes in access of form (for example, a change in direction or handshape). However, it is not necessary that these align - using the form tier annotations merely helps justify the boundaries in the phasing annotations.


Q: What’s the difference between rest and a hold?

A: Rest is when the hands are in a (full or partial) rest position, and are no longer gesturing, at a sort of “home base”. A hold is perceived as if the speaker is putting their gesturing on pause. The hands have not returned to “home base”, indicating that they may continue gesturing shortly. Holds usually occur in the middle of G-Units. 


Q: How many video frames of the hands being at rest should be included within the recovery annotation? 

A: We typically end the annotation at the first frame where the hands have fully reached their resting position. This means that the hands are no longer moving into the rest position. 


Other 


Q: When should I stop annotating the Gesture Unit or Phasing when there is a recoil effect?

A: A recoil effect refers to a biomechanical bounce of the hands after carrying out a physically strong movement. You should think about setting a rule for how to deal with these cases in the future. For example, you may want to include them as part of the stroke annotation, or alternatively you may create an entry in the Gesture Phasing Controlled Vocabulary for them. Again, M3D is a flexible annotation system, you can make it fit your methodological choices.


Q: How do I deal with different video angles (e.g., changes in angle, far shots, etc.)?

A: If you are annotating a video with a constant change of camera angle, you should only annotate the sequences where the hands are fully visible. A tip: create a new tier and name it for example "Data Visibility". At the beginning of an annotation process, you first annotate all the parts of the video where the hands are visible and label them as such and separate them from all the parts where the hands are not visible. If you later start annotating, you can just skip the parts where the hands are not visible. Also, remember that ELAN has a “zoom” function, allowing you to zoom in on your video in the cases of far shots.  


Q: Workflow: Should annotations progress G-Unit by G-Unit (i.e., fully annotating all dimensions of a single G-Unit before moving to the next), or should they proceed by dimension by dimensions (annotate all G-Units, then the form of all G-units, etc.)?

A: This is about your individual preference. Labelers may progress through their corpus G-unit by G-unit or dimension by dimension. However the order of annotating the individual tiers should be respected to avoid bias or circular annotation: 


Gesture Units > Gesture Form > Gesture Phases > Rhythmic Properties > Gesture Referentiality > Gesture Pragmatics


4 tips for your annotation process: