Corresponding author: Clarissa de Vries ( clarissa.devries@kuleuven.be ) Academic editor: Olga Iriskhanova
© 2021 Clarissa de Vries, Bert Oben, Geert Brône.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits to copy and distribute the article for non-commercial purposes, provided that the article is not altered or modified and the original author and source are credited.
Citation:
de Vries C, Oben B, Brône G (2021) Exploring the role of the body in communicating ironic stance. Languages and Modalities 1: 65-80. https://doi.org/10.3897/lamo.1.68876
|
Performing and understanding conversational irony requires a complex management of multiple viewpoints. To communicate and negotiate these intricate viewpoint shifts, speakers (and addressees) often use nonverbal means (e.g. gaze shifts, shrugs, shifts in body orientation, hand gestures, etc.) next to verbal viewpoint strategies. In the present paper we zoom in on the perspective of the speaker and try to describe and quantify bodily behavior in ironic utterances compared to non-ironic ones. To this end, we use data from a video-corpus of three-party interactions with participants wearing mobile eye-tracking devices that allow for precise eye gaze data. Our results show that speakers display more of the multimodal resources under scrutiny in ironic cases compared to non-ironic cases. More specifically, the involvement of bodily resources is mainly manifested in the use of laughter, head movements and body repositionings. We further show how those resources cluster into certain multimodal packages, and how the exact timing of the bodily behavior is relevant (i.e. the gaze behavior at the end of an ironic segment differs most notably from the end of a non-ironic one). Next to a quantitative analysis of the resources used in ironic talk in interaction, we also illustrate our findings with qualitative descriptions of relevant examples.
irony, multimodality, eye-tracking, pretense
It is a broadly accepted view that our language system, and more generally our cognitive system, allows us to organize, communicate and access information from different viewpoints, which can be temporal, spatial, epistemic or other. The strongly ‘viewpointed’ nature of language (
Research in a variety of disciplines, including cognitive science, linguistics, literary analysis and philosophy, has focused on the ways in which viewpoint permeates all products of human cognition, from mundane talk-in-interaction over intricate narratives to physical works of art (
The above-mentioned studies all deal with anecdotes and longer narrative sequences in talk-in-interaction. In his seminal work Using Language,
(1) Ken: and I’m cheap, - - -
Margaret: I’ve always felt that about you,
Ken: oh shut up,
(- - laughs) fifteen bob a lesson at home, -
In this specific sequence, a couple (Ken and Margaret) is engaged in a casual conversation on the husband’s work as a private teacher. When he makes a statement on the fee he charges as a tutor (and I’m cheap), his wife reacts with what at the surface level may seem as a confirmation (I’ve always felt that about you), using the anaphoric pronoun that to explicitly link her utterance to the preceding one. However, it is immediately clear that she construes a different meaning of the adjective cheap for ironic-playful purposes. When embedded in the phrasal construction to feel something about someone, this meaning shifts to an extended reading ‘of low moral value’ rather than the literal ‘inexpensive’. Underlying such an apparently simple case of wordplay, in fact, is an intricate play on viewpoints. In formulating her teasing reply, Margaret does not intend to seriously categorize her husband in any negative way. Rather, she sets up a pretense reading in which she reacts as if Ken had used cheap self-disparagingly in the extended sense, and she responds affirmatively. In other words, the ironic effect results from a tension between two layers of action and corresponding viewpoints: at the level of the actual communicative interaction, Margaret pretends that she, at a second level, seriously claims that Ken is metaphorically cheap. Thus, in the brief improvised scene, Margaret adopts the role of an implied participant, with different views from the perspective of which the scene is (re)viewed.
Many studies that zoom in on humor and irony in (face-to-face) interaction adopt some version of a pretense-based view on the phenomenon, like the one presented by Clark (and originally developed in
When reviewing the literature on irony, sarcasm and teasing as staged communicative acts, it is apparent that there is an imbalance in the way in which the phenomena have been approached. A significant body of studies in (cognitive) linguistics has focused on the way in which speakers make use of contextually available information to construe a locally relevant ironic meaning (see e.g.
To the best of our knowledge, only few studies zoom in on the way in which participants recruit various resources to participate in this process. Arguably the most studied aspect relating to the marking of humorous intent (speaker perspective) or interpretation (addressee perspective), is laughter. Studies such as Glenn (2003),
The present paper explicitly builds on some of the findings in the above-mentioned studies and aims to refine the existing evidence in a number of ways, including the granularity of analysis (zooming in on timing) and the interaction between resources. A temporal account with respect to timing is needed to gain a better insight into the interplay of multimodal marking of irony on the one hand, and the resources recruited for the micro-timing of turn-taking in interaction on the other hand. Looking at the interaction between resources may be interesting with respect to the pragmatics of irony, with different resources potentially providing contradictory information or only very few elements ‘hinting at’ the intended ironic meaning. At the same time, this study may be viewed as a first step in uncovering ‘multimodal Gestalts’ (Mondada, 2014) for realizing the specific interactional project of irony.
Based on the literature overview sketched above, and on the gaps therein, we now describe our research questions. Existing research on which multimodal resources interlocutors recruit to participate in ironic interaction is scarce. Therefore our first aim is to replicate the findings in those studies, leading to the first research question:
Q1: Do speakers perform more gaze shifts (including gaze aversions), laughter and body movement (including gestures, head movements and shrugs) in ironic utterances, compared to non-ironic ones?
If employing multimodal resources has the potential to signal ironic intent, it might be fruitful for speakers to call in those resources at specific times during their conversational turn. At the level of eye gaze behavior, not only overall fixation counts or fixation durations might be relevant, but also more specific gaze patterns. In the present study we expect speakers to visually check whether or not their ironic intentions have been understood by their conversational partners. This visual grounding of ironic intention will mainly occur at the end of the utterances, i.e. after a speaker can expect a listener to have perceived that ironic intent. As such, we zoom in on the end of the utterance, resulting in Q2:
Q2: Do speakers perform more gaze shifts (towards, between and away from their conversational partners) at the end of their ironic utterances?
As opposed to verbal behavior, which is linear by nature, nonverbal behavior can be expressed in parallel at multiple levels. If the resources mentioned in Q1 appear to correlate with ironic utterances, a combination of those resources is also more likely to occur in ironic cases. This would be especially true for those cases where the negotiation of the ironic communicative act is more difficult or challenging. Since, to the best of our knowledge, this multimodal clustering has not received substantial scientific attention, we will explore the following research question:
Q3: Within ironic utterances, what clusters of gaze shifts, laughter and body movements arise?
Given the lack of studies zooming in on these multimodal packages of resources that could be linked to irony, we will not predict specific combinations of resources to be more frequent than others, but we will conduct an exploratory analysis of all possible combinations.
To make explicit how we tried to answer our research questions, we will first present the data collection and annotation procedure for each of the resources that we included in the study. The main findings of the study are presented in section 3, zooming in on the comparison of the ironic and non-ironic utterances in our dataset, as well as exploring the combinations of resources in ironic utterances and the temporal aspect of gaze behavior. In a final step, we discuss the results in light of recent findings in the literature.
The video data used in this study were selected from the Insight Interaction Corpus (
After giving written consent to take part in the study, participants were invited to the recording room and seated in a triadic set up, enabling all participants to see each other with the same amount of effort. After participants were briefed on the recording session, and the eye-tracking glasses were calibrated, the experimenters left the recording room to allow for a conversational context as unobtrusive as possible. The recorded interactions consisted of two parts: the first part was a free conversation between all participants. For this interaction, participants were instructed to continue talking to each other as they did before the start of the recording (when they were waiting outside). In other words, they could talk freely about any topic they preferred. The second part of the interaction consisted of a brainstorm task. For this task, participants were asked to brainstorm about their ideal student bar and their ideal student house. Even though participants were instructed to talk about a given topic, this conversation too was highly interactive and not scripted. Participants were free to decide how rigorously they stuck to the brainstorm task. In practice, this often resulted in a brainstorm combined with free conversation between participants (for instance about student bars that they went to).
For the purposes of the current study, we selected all brainstorms and spontaneous conversations from the triadic interactions. In total this amounted to 24 participants (3 male and 21 female) divided over 8 triads. Each triad took part in the free conversation and the brainstorm, which led to a sample of 16 videos (of 256 minutes of recordings in total). For the spontaneous interactions, a conversation lasted 21 minutes on average; for the brainstorms 18 minutes.
Recording equipment
All interactions were video-recorded using one external camera (Sony HDR-FX1000E, 25 frames per second, 720 × 576 pixels). All participants were also wearing Tobii Pro Glasses 2 (sample rate 50 Hz). These eye-tracking glasses are equipped with a scene camera and infra-red cameras. The infra-red cameras were used to capture the gaze focus of the participants, which was then mapped onto the scene cameras. All four camera perspectives were synchronized in one Quadvid to facilitate data analysis. A screenshot of the Quadvid can be found in Figure
Annotation
As
The video data were annotated for speech, irony, stance, gaze, laughter, head movement, shoulder shrugs, hand gestures and body repositionings, using ELAN (
The analysis of gaze behavior is inspired by previous studies (
Previous research (e.g.
The use of head movements and shoulder shrugs in stance-taking is well documented (e.g.
Hand gestures have not yet been systematically considered in the study of irony (with the exception of
Finally, during the annotation of multimodal behavior, it appeared that participants displayed many body repositionings during ironic utterances, such as crossing or uncrossing the legs, folding the arms, etc. We therefore systematically annotated the presence of body repositionings during both ironic and non-ironic utterances.
As this study focuses on the use of nonlinguistic resources in irony, we consciously did not take into account classic paralinguistic features such as prosody.
For some of the variables described above (i.e. laughter, gaze behavior, hand gestures and body repositionings), the annotation process does not involve training or interpretation and is purely based on unambiguous, visual perception. These annotation processes are not subjected to a formal inter-coder agreement test. However, for those parameters that leave more room for interpretation (irony, head movement type), it is important to check the consistency and reliability of annotations. Therefore, we accounted for inter-rater reliability for these variables. In what follows, we describe the annotation process for all separate variables in more detail.
Irony
Previous research identified ironic utterances on many different levels, such as the sentence level (
Each of these forms of irony “minimally reflects the idea of a speaker providing some contrast between expectation and reality” (
In order to facilitate the comparison of ironic with non-ironic stance acts, we selected ironic segments that fall on an evaluative scale. For instance, one participant spoke about a time that she went to a camp, and slept in a house in the forest that had no lock. One of her co-participants reacts and utters (2):
(2) Ja da’s nie eng (lacht)
Yeah that’s not scary (laughs)
‘k zou echt heel gerust slapen die nacht, amai
I would really sleep very comfortably that night, oh boy
These two segments can be interpreted as ironic because the speaker provides a contrast between expectation and reality (i.e. the speaker pretends that the situation is not scary, whereas in reality it is). The segment is also a stance act, because the speaker indicates her affective position with respect to the missing lock in the house. Lastly, this affective stance can be put on a scale, on which the stance can be compared to other stances (such as that it is really okay to sleep in a house without a lock).
In another conversation, one participant (Amber) is telling two others about a fantasy she has in which she wants to push a cyclist off his bike, while he is cycling. One of the others (Lena) reacts and utters (3):
(3) (lacht) sadistje
(laughs) little sadist
Although this example is a not a straightforward reversal of evaluation, it can still be considered ironic, as it reflects a contrast between expectation and reality. The intention of Lena was not to call her friend a person who takes pleasure in inflicting pain, punishment, or humiliation on others, but represents a teasing comment on the narration of the first participant. The segment is also a scalar evaluation of Amber, which can be compared to other evaluations (such as that this is a completely normal thought).
Non-ironic stance
The ironic segments in our dataset are compared to non-ironic evaluative segments. We placed evaluative segments in our dataset on a scale, ranging from a positive to negative evaluation. Specifically, we annotated explicit scalar evaluative segments such as (4), in which the participants are talking about the house in the forest without a lock, a bit further on in the conversation of utterance (2). The same co-participant reacts to this story and utters (4):
(4) Da’s pas eng
Now that is scary
This segment is an explicit evaluation (i.e. this situation is scary) and as such it can be placed on a scale and compared with other evaluations (like this situation is not that scary). Similarly, in (5) a participant reacts to a story of his friend, who explains that she usually discusses the rate of a taxi ride in advance with the driver.
(5) Ja da’s wel slim
Yeah that’s smart
The speaker evaluates the segment of his friend, and his evaluation can be contrasted with other evaluations (like that’s stupid).
Using the method described above, we annotated 123 scalar ironic cases in our data set. Three videos were excluded because they did not contain any case of scalar irony. For every ironic segment by a speaker in our corpus, we selected a non-ironic scalar evaluative segment by the same speaker, resulting in a total of 246 cases. An inter-rater-reliability test was conducted to ensure a proper selection of ironic and non-ironic cases. A second trained coder annotated roughly half (118 cases) of the selected data, and annotated whether a segment was “ironic” or “non-ironic”. Agreement between the two raters was 91.5% (Cohen’s Kappa = 0.831), providing us with confidence that the ironic cases were annotated correctly.
Gaze
Gaze of the interlocutors was annotated for the ironic and evaluative segments. To prevent influence from the speech in annotating gaze, this was done with the sound turned off. The areas of interest were defined as 1) the face of an interlocutor; 2) the body of an interlocutor; 3) background. Whenever a participant shifted their gaze to a new area of interest, a new gaze fixation was annotated. We set the minimum fixation duration at 120 msec, in accordance with earlier research (
For the purposes of the current study, we measured gaze behavior of the speaker. Gaze aversions are defined as a gaze away from one of the listeners and toward the background. Speaker-to-listener gaze was defined as all fixations from the speaker on the face of either of the listeners. Mutual gaze was defined as the overlap in which a speaker looks at one of the listeners while that same listener looks at the speaker. Lastly, a gaze shift was defined as a shift to a new area of interest (either a participant or the background).
Gaze behavior during spontaneous social interaction is a temporally sensitive phenomenon. To explore the temporality of gaze behavior accompanying ironic versus non-ironic segments in more detail, we also annotated gaze behavior by the speaker in the last 1000 msec of a segment. During this timeframe, we measured to what extent there were gaze shifts from or towards the background (henceforth BG-shifts), between the two co-participants (henceforth CO-shifts) or no gaze shift at all (no-shifts). We also annotated the gaze state of the speaker at the end of the segment (i.e. gaze at a co-participant, or gaze at the background).
Laughter
The presence of laughter by the speaker during or immediately after the ironic and evaluative segments was annotated, at most 500 msec after the end of the segment.
Head movements
The presence of head movements during or immediately after the ironic and non-ironic evaluative segments was annotated. Head movements were divided into three categories: tilts (head movement where the ear moves toward the shoulder, left, right or both sequentially); nods (up- and downward movement of the nose) and shakes (left- and rightward movement of the nose). Head movements that did not belong to either of these categories (for instance when the head moves backward without any lateral movement), were categorized as “other”. The presence of head movements was coded by one annotator. Head movement categories were then independently coded by two annotators, and cases of disagreement were discussed until agreement was reached for all items.
Shoulder shrugs
The presence of shoulder shrugs was annotated for all segments. A shoulder shrug could consist of either the left, the right, or both shoulders moving upwards.
Body repositioning
Body repositionings were defined as movements by a participant in which the major bodily articulators perform a body adjustment. Cases in which participants (un)crossed their arms or legs or changed their seating position on the chair (e.g. shifting from ‘hanging’ to sitting upright, or swaying from a torso orientation leftwards to rightwards), were annotated as a binary variable (that is, there either is or is no body repositioning during a segment).
Hand gestures
Hand gestures were identified as communicative movements by the hand(s). Only hand and arm movements that were not self-adaptors were considered. In the current study, gestures were not further categorized into formal characteristics or functional gesture types.
In this paper we investigated which semiotic resources are recruited by speakers in the expression of irony in spontaneous interaction, as well as how those resources interact. In the first part of this section, we describe the distribution of these semiotic resources in ironic and non-ironic segments. For this analysis, the statistical analysis software R (R Core Team, 2021) was used. In the second part of this section, we take a more qualitative approach and explore how these resources interact, and how their variation in use can be explained.
In the current corpus of 16 conversations, we found 123 scalar ironic segments, enough for a small quantitative study. Three of the interactions did not contain any ironic segment, and were thus excluded from this analysis. We then annotated laughter, gaze behavior, head movements, shoulder movements, hand gestures and body repositionings for both ironic and non-ironic segments. To provide a first insight in the distribution of bodily resources, we added up those resources that can be reduced to a binary score (laughter, CO-shift at end of segment, head movement, shoulder shrug, hand gesture, body repositioning). In Table
In what follows, we will describe the contributions of all separate resources to the expression of irony in more detail.
Nr of resources involved | Frequency | |
---|---|---|
Condition | Ironic | Non-ironic |
0 | 13 | 51 |
1 | 33 | 45 |
2 | 52 | 20 |
3+ | 26 | 6 |
Laughter
An obvious marker of irony can be found in laughter. We investigated laughter by the speaker accompanying ironic and non-ironic segments. There was more laughter in ironic compared to non-ironic segments. Out of all 123 ironic segments, 62 were accompanied by laughter, compared to 8 out of 123 for non-ironic segments. A chi-square test showed that this difference was significant (χ2 (1) = 56.089, p < 0.0001), and the effect was moderate (Cramer’s V = 0.477)
Gaze
With respect to gaze behavior across the segment, we were interested in four variables: gaze aversions, speaker-to-listener gaze, mutual gaze, and gaze shifts. There does not seem to be a difference in the counts of mutual gaze between ironic and non-ironic segments, which was shown by a Wilcoxon test for independent samples (Ws = 4297, p = 0.330, Hedges’ g = 0.006). Similarly, we did not find a significant difference in the amount of speaker-to-listener gazes (Ws = 7729.5, p = 0.288, gs = 0.146). We also did not observe a significant difference in the amount of gaze aversions by the speaker (Ws = 7787, p = 0.122, gs = 0.224) or in the amount of gaze shifts (Ws = 8029.5, p = 0.101, gs = 0.245).
Regarding gaze duration, again no differences arise between the two conditions with respect the duration of mutual gaze (Ws = 4111.5, p = 0.175, gs = -0. 095), speaker-to-listener gaze duration (Ws = 7616, p = 0.436, gs = .034) or the duration of gaze aversions (Ws = 7669, p = 0.095, gs = 0.095).
As mentioned above, we also investigated gaze behavior during the last 1000 msec of the segment in more detail. Within this timeframe, we analyzed the presence of gaze shifts and the end state of the gaze by the speaker. We found that a different gaze pattern emerges at the end of ironic segments compared to non-ironic segments. At the end of ironic segments, participants more often show gaze shifts from and to the background (30 out of 123), as well as more gaze shifts between listeners (38 out of 123), and less continuous gaze towards one AOI (or “noshift”) (46 out of 123), compared to non-ironic segments (22, 27 and 66 out of 123 respectively). This difference was significant in a Chi-square test (χ2 (2)= 10.893, p = 0.004, Cramer’s V = 0.171). As for the gaze state at the end of the segment, no different pattern emerges for ironic compared to non-ironic segments (χ2 (1) = 0.304, p = 0.582, Cramer’s V = 0.046). In both conditions, participants look at their interlocutors more often than they look away.
Head movements
The presence of head movements was also annotated for ironic and non-ironic segments. Here we found that ironic segments are more often accompanied by head movements (63 out of 123) compared to non-ironic segments (40 out of 123) (χ2 (1) = 8.812, p = 0.003), which is a moderate effect (V = 0.200). We then explored head movements in more detail, by looking at differences in head movement types. The frequencies of head movement types per condition are listed in Table
Frequencies of head movement types during ironic and non-ironic segments.
Head movement type | Ironic | Non-ironic |
---|---|---|
Nod | 24 | 23 |
Shake | 17 | 14 |
Tilt | 24 | 10 |
Other | 8 | 2 |
From this more granular analysis, it appears that a difference in the amount of head movements can mostly be attributed to the increase of head tilts in ironic segments compared to non-ironic segments.
Shoulder shrugs
The annotation of shoulder shrugs showed that there were very few cases in this dataset. There were no differences between shoulder shrugs in ironic (10 out of 123) compared to non-ironic segments (9 out of 123), (χ2 (1) = 0, p = 1, V = 0.014).
Body repositioning
Sometimes participants re-adjusted themselves during a segment. When counting this systematically, it appeared that there were more body repositionings in ironic segments (21 out of 123) compared to non-ironic segments (6 out of 123) (χ2 (1) = 8.031, p = 0.005), a small effect (V = .194).
Hand gestures
We investigated the overall presence of gesture in ironic and non-ironic segments, disregarding different gesture types or form-based characteristics. The results show that 34 out of 123 of ironic segments are accompanied by gesture, compared to 23 out of 123 for non-ironic segments. This difference was not significant in a Chi-square test (χ2 (1) = 2.284, p = 0.131, V = 0.106).
All results are summarized in Tables
Descriptive statistics for gaze behavior during ironic and non-ironic segments.
Variable | Condition | Median (IQR) duration | Median (IQR) count | ||
---|---|---|---|---|---|
Mutual gaze | Ironic | 619.0 | [50–1002] | 1 | [1–1 |
Non-ironic | 609.5 | [125.2–1116] | 1 | [1–1.25] | |
Speaker-to-listener gaze | Ironic | 853.5 | [372.5–1344.2] | 1 | [1–2] |
Non-ironic | 767.0 | [141.2–1425.5] | 1 | [1–2] | |
Gaze aversion | Ironic | 0 | [0–274.2] | 0 | [0–1] |
Non-ironic | 0 | [0–0] | 0 | [0–0] | |
Gaze shift | Ironic | – | 1 | [0–6] | |
Non-ironic | – | 1 | [0–6] |
Total counts of resources involved in the expression of ironic and non-ironic segments, and proportion of occurrence within the condition.
Variable | Condition | Raw frequency | Proportion |
---|---|---|---|
Laughter * | Ironic | 62 | 0.504 |
Non-ironic | 8 | 0.065 | |
Hand gesture | Ironic | 34 | 0.276 |
Non-ironic | 23 | 0.187 | |
Head movements * | Ironic | 64 | 0.520 |
Non-ironic | 40 | 0.325 | |
Body repositioning * | Ironic | 21 | 0.171 |
Non-ironic | 6 | 0.049 | |
Shoulder shrug | Ironic | 9 | 0.073 |
Non-ironic | 8 | 0.065 | |
Gaze shift end of segment * | Ironic | 68 | 0.553 |
Non-ironic | 49 | 0.398 | |
Gaze at co-participant end of segment | Ironic | 85 | 0.691 |
Non-ironic | 81 | 0.659 |
As mentioned in the introduction, the current paper aims to refine the existing evidence for embodied communication of irony by investigating how multiple resources are employed in spontaneous interaction. In this section, we answer two questions related to this overarching goal. Firstly, we explore which resources co-occur systematically in the expression of ironic stance, in the form of “multimodal packages”. Secondly, we explore what factors might drive variation in the markedness (i.e. the amount of resources involved in the expression) of an ironic segment.
Co-occurrence of resources
To explore the co-occurrence of different markers of irony quantitatively, we calculated Kendall’s Tau correlations between all of the resources under scrutiny in the current paper: gaze behavior (mutual gaze, speaker-to-listener gaze, gaze aversions, gaze shifts, CO-shifts at the end of a segment), laughter, head movements, shoulder shrugs, body repositionings, and hand gestures. We did this only for the group of ironic segments. Below we discuss those combinations of resources for which a significant correlation was found.
As a first result, this analysis shows a correlation between the presence of laughter, and different measures of gaze behavior. That is, in ironic segments where there is laughter, there are also more gaze shifts (Τ = 0.218, p = 0.009), more instances of mutual gaze (Τ = 0.177, p = 0.040) and speaker-to-listener gaze (Τ = 0.219, p = 0.010), as well as gaze aversions (Τ = 0.243, p = 0.007). Note that these different measures of gaze behavior are also correlated. Interestingly, there was no correlation between the presence of CO-shifts at the end of a segment, and the presence of laughter.
Let us now take a look at some examples from the dataset to examine the relation between laughter and gaze in more detail. In the fragment below, three participants (Anna, Ella and Seb) are talking about the father of Anna, one of the participants. Anna explains that her father is an autodidact and invests much time in self-study. He speaks better French than Anna, and he can translate Latin texts even though he never studied Latin in school. In lines 1 and 2 of the fragment below, Ella comments on this, showing her non-ironic stance. In line 3, Anna adds to this by saying makes you cry, right. As this does not really make Anna cry, this segment can be interpreted as ironic. Anna takes on a pretense role in which she is so upset about her father’s knowledge that it makes her cry. Her laughter directly following the segment marks the non-seriousness of the segment.
In order to represent the distribution of visual attention of all participants, we use a score-like representation for their gaze behavior in co-occurrence with the transcript lines. The symbols in the score represent the gaze target at each point in time (e.g. in line 1, the current speaker Ella shows sustained gaze to the background (bg), whereas Anna shifts her gaze from the background to Ella towards the end of the short segment. The third participant, Seb, gazes at Anna for the entire duration of the segment). By representing the gaze direction of all three participants in this way, we get a detailed picture of the distribution of visual attention at each point in time, related to the segments being produced. When looking at the gaze score for the ironic segment in fragment 6, it is clear that Anna shifts her gaze between the background and her co-participants. In this example, both the use of gaze shifts and laughter can be argued to form interactive devices, inviting her audience to align with her and join her in this pretense mode.
(6) (zucht) maf
1 Ella (sighs) nuts
Anna bg------Ella
Ella bg-----------
Seb Anna-------
Echt [maf;]
2 Ella really [nuts;]
[is we]nen he (lacht)
→ 3 Anna [makes you cr]y right, (laughs)
Anna Ella------bg-----Seb---Ella-------------------
Ella bg-------------------------------------Anna----
Seb Anna--------------------------------------------
In another example, shown in fragment 7 below, three participants are brainstorming about their ideal student bar. The participants (Leah, Karen and Mia) suggest that it would be nice to serve some food first, like nachos. Leah jokingly says “for free”, and laughs. This marks the beginning of the transcript, in line 1. Line 1 can be interpreted as ironic since, in reality, student bars would most likely not serve food for free. By laughing, Leah marks this segment as playful and elicits a response from her addressees. In line 2, Mia returns to a serious statement, and starts with another suggestion. She uses a Palm Up Open Hand (PUOH) gesture. While she gestures, Leah shifts her gaze towards Mia and takes the turn. In line 3, Leah expands on her initial statement, also using gesture (a Palm Down Open Hand gesture, PDOH) and still looking at Mia. Mia then shifts her gaze toward Leah. Again, Leah laughs and this time Mia responds by laughing (in line 5), aligning her gesture with Mia’s PDOH gestures and repeating Mia’s line “everything for free”. In this example, both the use of laughter and mutual gaze contribute to the joint act of irony, by eliciting a reaction from the addressees and involving them in the staged communicative act.
(7) gratis;
1 Leah for free ;
((Leah laughs))
Leah Karen----------
Karen Leah-----------
Mia bg--------------
+en wat mis[schien ook wel fijn is ;+
→ 2 Mia and what might also be nice ;
+Mia gestures PUOH------------------+
Leah Karen----Mia----------------------------
Karen Leah----ges Mia-------Leah------------
Mia bg-------------------Leah----------------
*[[alles] gratis ; *
3 Leah [[everything] for free ;
*Leah gestures PDOH-----*
4 Karen [a:h -]
((Leah laughs))
<<lachend> ja (.) alles *$gra*$tis ja .>
5 Mia <<laughing> yeah (.) everything for free yes.>
*Mia gestures PDOH*
$Mia shakes head$
Leah Mia-----------------------bg------Karen-
Karen Leah-------------------Mia----Leah------
Mia Leah------------------bg----------------
Turning to other systematic co-occurrences of resources in the expression of irony, we found that the frequency of mutual gaze was moderately correlated with the presence of hand gestures during ironic segments (Τ = 0.201, p = 0.020). An example of this can be found in the fragment above. In this example both Leah and Mia use gesture accompanying their ironic segments. Specifically in line 3 where Leah repeats her initial ironic segment, Mia and her display mutual gaze. After this, Mia repeats Leah’s gesture as well as her verbal expression, joining her in the pretense.
Another small but significant negative correlation, was found between the presence of gesture and shoulder shrugs (Τ = -0.184, p = 0.042). Please note that this result should be interpreted with caution as there are only few attestations of shoulder shrugs in this dataset. Again a negative correlation is found between head movements and body repositionings, demonstrating fewer head movements in segments with body repositionings (Τ = -0.249, p = 0.006).
Lastly, as mentioned above, several measures of gaze behavior were correlated. The frequency of gaze shifts correlated with the presence of all other gaze variables, including CO-shifts at the end of the segments. In the same line of reasoning, the number of mutual gazes was strongly correlated with the number of speaker-to-listener gazes. Finally, the number of gaze aversions was also correlated with the number of speaker-to-listener gazes.
Summarizing these findings, we see a co-occurrence of laughter and various measures of gaze behavior throughout the segment. The presence of gesture is positively correlated with the presence of mutual gaze, and negatively correlated with the presence of shoulder shrugs. Head movements and body repositionings are also found to be correlated. Finally, various measures of gaze behavior are correlated.
Variation in the multimodal expression of irony
Although some resources, such as the use of gaze shifts and laughter, co-occur more systematically in the expression of irony, the current dataset also shows quite some variation in the multimodal “marking” of irony. In this section, we explore the question of why some ironic expressions are marked more than others.
A first obvious factor that might play a role here, is the duration of the segment: the longer a segment, the more opportunities a speaker has to use multiple resources. And indeed, the amount of resources involved in the expression of irony is correlated with the duration of ironic segments (r (121) = 0.279, 95% CI [0.107;0.434], p = 0.002). Longer segments are accompanied by more resources. Interestingly, this correlation between duration of a segment and number of resources does not arise for non-ironic segments (r (120) = -0.028, 95% CI [-0.204;0.151], p = 0.763). Furthermore, there is no difference in duration of ironic versus non-ironic segments. As a consequence, duration alone cannot explain the amount of resources used. In other words, the amount of resources we observe cannot be solely attributed to the chance-level expectation that arises from the fact that longer segments are by default more likely to contain more of the resources under scrutiny in the present paper.
Another factor that might be at play, is the difficulty for conversational partners to understand an ironic segment as non-serious. One possibility is that ironic segments that are difficult to understand, for instance due to a lack of context, are marked using multiple bodily resources in order to facilitate comprehension for the listeners. However, a qualitative analysis shows that this does not seem to be the case. On the contrary, ironic segments that are part of an ironic sequence of some sort (i.e. that are immediately preceded or followed by another ironic segment by the same or another speaker), frequently occur with three or more bodily resources. This conflicts with the expectation that segments that are more straightforwardly recognizable as being ironic (i.e. because they are part of a sequence), require less multimodal marking. Of all ironic segments marked by three or more resources, 19 out of 26 were preceded or followed by another ironic remark within two turns, compared to 6 out of 13 segments that were not marked by any resources.
The following example in fragment 8 below provides an illustration of this embeddedness. Three participants (Paul, Gabriëlle, and Mara) are speaking about Mara’s sister, who is a very good singer. Mara explains that her sister sings at home, but she never performs at other places. Paul then comments that she has a hidden talent. This marks the beginning of the fragment. Gabriëlle mumbles that she has no hidden talent, the first ironic segment. Mara then responds to her in 3, saying that “it all radiates from her”. This is an obvious non-serious teasing. Mara and Gabriëlle are friends, and this sarcastic remark doesn’t appear to be meant as harmful. Instead, both Gabriëlle and Mara seem to adopt a pretense reading about the state of Gabriëlle’s talents. All of the participants mark their mutual understanding of this pretense by responding to the segment by laughing. Segment 3 is marked in a number of ways. Mara uses a metaphoric iconic gesture that represents the radiation of Mara’s talent (see also Figure
(8) <<mompelt>> Ik heb geen verborgen talent. >
→ 1 G <<mumbles> I have no hidden talent . >
P M--------------------G----------------
M bg----------------G-------------------
G M-------------------------------------
((G laughs))
((P laughs))
((M laughs))
2 P (inau[dible)
[bij u straalt het er gewoon *allemaal [af]*.
→ 3 M In your case it just all radiates from you.
*gestures------*
[G laughs]
P G--------------------------G----------------
M G---------------------------------P---------
G bg----------------------M-------------------
((M laughs))
((P laughs))
Another example of a highly marked ironic segment is found in the transcript concerning fragment 7, the brainstorm about a student bar. When Mia says “yeah everything for free”, she uses hand gestures, head shake and laughter accompanying this segment. However, these markers are not necessary for the interpretation of the ironic segment. From the lexical content only, it becomes clear that this segment is not meant as a serious statement. A student bar cannot serve free food. Similar to the “hidden talent” example, this segment is also embedded in a short ironic sequence in which all participants contribute to the joint pretense that the food (or even everything) is for free, adding to the obviousness of the irony in this segment.
Summarizing this exploration of the use of multiple resources in the expression of irony, we find that the use of laughter systematically co-occurs with the use of marked gaze behavior and that the use of head movements is negatively associated with the presence of body repositionings. Furthermore, variation in the amount of resources involved in the expression of irony can be partially explained by the duration of the segment. We also hypothesize that the embeddedness in an ironic sequence plays a role. We now move on to the discussion of our findings in light of the existing literature, and propose some ways to move forward.
Research in a variety of disciplines, including cognitive science, linguistics, literary analysis and philosophy, has investigated how the notion of viewpoint permeates all products of human cognition, from mundane talk-in-interaction over intricate narratives to physical works of art (
We took a broad approach and investigated the role of laughter, gaze behavior, head movements, shoulder shrugs, body repositionings and hand gestures. These variables have all been found to play a relevant role in the expression of irony, and stance more broadly (
The finding that speakers use more laughter accompanying ironic compared to non-ironic segments, is in line with previous research on this topic, which demonstrates that laughter can signal ironic intent of a speaker, or mark segments as playful (
Zooming in on the end of the segment, speakers display more gaze shifts both between their addressees and to and from the background. These findings are in line with a view of irony as a highly intersubjective process, during which it is necessary for the speakers to check whether their interlocutors have parsed their segments as intended. We did not, however, observe an increase in mutual gaze between speakers and their addressees at the level of the entire ironic segment, in contrast to what others have observed (Brône & Oben, 2021 observed a higher amount of gaze between addressees in ironic v. non-ironic segments). Neither did we, as yet others have observed (
Head tilts, too, have been noted to function as irony markers (
Lastly, more body repositionings were found in ironic compared to non-ironic segments. Follow-up research will have to point out whether this effect will hold with a larger sample size, and investigate the precise nature of this effect. Body repositioning in our annotation should in no case be confused with body partitioning that e.g. Debras (2015),
Relating to the interaction between these resources, we found that the presence of laughter was associated with more marked gaze behavior: more gaze shifts, moments of mutual gaze, speaker-to-listener gaze, and gaze aversions. In a small qualitative analysis, we showed that the joint use of gaze and laughter can be used to invite addressees to join an ironic pretense in interaction. Interestingly, there was no correlation between the presence of laughter and gaze shifts at the end of the segment, speaking against the idea that their co-occurrence is based only on alignment-seeking with co-participants. Future research on the precise timing of both of these resources could shed more light on the nature of their interaction.
We also found that the presence of gesture was associated with more mutual gaze in ironic segments. An interesting follow-up would be to investigate what kind of gestures are associated with mutual gaze (for instance the use of pragmatic and interactive gestures,
There was a negative association between the presence of body shifts and head movements. One possible explanation is that body repositionings require the use of the whole body including the head, which is then not available to be used as a communicative marker. However, as discussed, more research is needed to determine the nature of body repositionings in ironic interaction in general, before we are able to draw any conclusions regarding its interaction with other communicative devices. The same holds for the negative association that was found between the presence of shoulder shrugs and gestures.
Finally, several measures of gaze behavior were correlated. The frequency of gaze shifts correlated with the presence of all other gaze variables. This is a straightforward finding, as the presence of gaze shifts automatically assumes the presence of more gaze fixations, be it gaze aversions, mutual gaze, or speaker-to-listener gaze. In the same line of reasoning, the number of mutual gazes was strongly correlated with the number of speaker-to-listener gazes. Finally, the number of gaze aversions was also correlated with the number of speaker-to-listener gazes. This finding can be attributed to the fact that for a fixation to be counted as a gaze aversion, there must always be a speaker-to-listener gaze prior to this fixation. Lastly, the number of gaze shifts was correlated with CO-shifts at the end of the segment. In ironic segments that display more gaze shifts throughout the segment, it seems logical that there will also be more gaze shifts at the end of the segment.
In investigating the variation between the amount of resources involved, we explored whether this variation could be due to potential processing difficulties. Previous research on comprehension of irony showed that visual cues by a speaker, including facial expressions and bodily movements, enable interlocutors to recognize the ironic intent, even more so than the much-discussed ironic tone of voice as an acoustic cue (
Another factor that impacted the degree of resources involved was the duration of the segment: longer ironic segments were accompanied by more resources than shorter ironic segments. However, it is unlikely that the duration of the segment was the only factor at play, as the duration was not correlated with the amount of resources involved in the expression of non-ironic stance. Therefore, follow-up studies investigating the embeddedness of ironic segments in so-called “ironic sequences”, in which all participants in a conversation join in this staged communicative act, are necessary to disentangle the role of the body in communicating and negotiating irony.
One factor that was left unexplored with respect to variation in the resources involved, is individual differences between participants. Although the use of quantitative models that can take into account such individual differences, like linear mixed effects models (cf.
Another limitation with regard to variation in bodily marking of irony, is that segments that were categorized as “non-marked” in our study might have had some form of bodily involvement that was not part of our analysis. The current study did not take into account the role of prosody, and the role of facial expressions. Within the limited scope of this study, we decided to zoom in on visual bodily behavior rather than paralinguistic features, which have already received a significant amount of scholarly attention. The reason for excluding facial expressions as a feature of visible bodily behavior is of a more practical nature: the participants in the recorded interactions are all wearing eye-tracking glasses, which cover essential features for the analysis of facial expressions (such as eyebrows) .
Finally, the current study focused on the behavior of the speaker within single segments. A qualitative exploration in the current study did bring forward the hypothesis that the embeddedness of ironic segments in an interactional context may play a crucial role in explaining the involvement of various bodily resources such as laughter and gaze behavior. This study can thus be interpreted as an invitation for a more systematic study of the role of interactional context in the multimodal expression of irony, taking into account behavior by all participants over longer stretches of ironic interaction.
In conclusion, this study investigated the multimodal expression of irony in interaction. Ironic utterances can be viewed as a form of staged communicative acts (