PhD Position F/M Automatic Detection of Meaning Misalignment in Dialogue for the Evaluation and Improvement of Conversational AI
Le descriptif de l’offre ci-dessous est en Anglais
Type de contrat : CDD
Niveau de diplôme exigé : Bac + 5 ou équivalent
Fonction : Doctorant
Niveau d'expérience souhaité : Jeune diplômé
Mission confiée
Context
Human conversation is remarkably efficient, yet misunderstandings are frequent, and many of them likely remain undetected (Schober et al., 2018). Among the many possible sources of communicative friction, there are misunderstandings and disagreements about word meaning. Even when speakers share a common language and know the same words, they may associate different definitions or connotations to a given word, or may disagree as to how or whether a word applies to a specific context or situation, or how it should be interpreted. Such differences in lexical and conceptual mental representations constitute a subtle but pervasive source of miscommunication.
Word meaning-related communication problems are particularly conducive to meta-linguistic discussions where speakers may momentarily step outside the main flow of conversation to question, clarify, or discuss how a particular word should be understood. Unlike other types of misunderstandings that concern a single instance and may disappear once resolved, these episodes revolve around lexical items that are likely to be reused throughout the conversation. These cases, where difficulties become visible, reveal how interlocutors collaboratively establish locally shared conventions for interpreting words within the rest of the interaction.
Conversational repair and clarification requests have long been studied in linguistics and pragmatics (e.g., Bazzanella and Damiano, 1999; Brennan and Clark, 1996; Purver et al., 2001), and more recently in conversational AI, where they are central for a more human-like, natural and efficient interaction and have an impact on user satisfaction in their interactions with agents (Axelsson et al., 2022, Aliannejadi et al., 2019; Zou et al., 2023). Much of this work, however, targets general semantic ambiguities, often in the context of search queries assisted by agents and task-oriented settings (Keyvan and Huang, 2022; Tang et al., 2025, Gan et al., 2026), or referential ambiguity problems (Madge et al., 2025). The lexical-semantic dimension of this phenomenon (i.e., how word meaning itself needs to be resolved to continue a conversation) has received comparatively little attention (Garí Soler et al., 2026), despite being crucial for evaluating a model’s ability to establish and maintain meaning coordination in conversation.
Designing conversational agents that interact with humans in a natural way requires enabling these systems to recognize and adapt to users’ lexical choices, detect subtle signs of misunderstanding, and engage in targeted clarification when needed. Current large language models exhibit strong general linguistic abilities, but recent studies show that they still struggle to reliably detect ambiguity or meaning uncertainty in context (Liu et al., 2023; Zhang et al., 2024). Moreover, many models are optimized to minimize conversational friction by always providing lengthy and comprehensive fluid answers, whereas a brief metalinguistic clarification can sometimes lead to more efficient and satisfying interactions. Studying naturally occurring misunderstandings in human conversation provides a principled way to identify the mechanisms that make such adaptation possible, and offers guidance for evaluating and improving conversational systems with respect to their handling of meaning-related difficulties.
Recent work (Garí Soler et al., 2026) has led to the creation of a corpus of conversational episodes in which speakers explicitly pause the interaction to question, clarify, or dispute the meaning of a word or expression. These exchanges, referred to as Word Meaning Negotiations (WMNs) (Myrendal, 2015), have been annotated according to their internal structure, distinguishing a trigger (a word usage that gives rise to a difficulty), an indicator (an utterance signaling the problem), and the negotiation that follows. This resource provides rare, structured evidence of naturally occurring moments where lexico-semantic misalignment becomes observable in real interaction. Although the corpus remains limited in size, initial experiments demonstrate potential for automatically identifying such episodes with the use of language models (Garí Soler et al., 2025).
These annotated interactions will be studied alongside other sources of evidence for meaning-related difficulty in discourse. Building on these resources, this PhD proposes to model and analyze lexico-semantic misalignment and alignment in conversation, with the broader goal of informing the design and evaluation of conversational AI systems that are adaptive and capable of engaging in the kind of collaborative clarification that characterizes human communication.
Principales activités
Scientific Objectives
This thesis is guided by the following overarching question: How can observable episodes of meaning-related misunderstanding in human conversation inform the modeling of lexico-semantic misalignment and its resolution, and how can this knowledge be transferred to conversational AI systems?
The first research direction concerns data collection. The goal is to obtain larger and more diverse datasets of conversational episodes in which speakers explicitly signal a difficulty related to word meaning. Building on existing resources and approaches to WMN indicator detection, the thesis will seek reliable (semi-)automatic ways of identifying such episodes across different conversational settings, registers and modalities. This will result in new annotated resources to be used in subsequent experiments and which will be shared with the scientific community.
The second direction will focus on data analysis. This will involve an exploration of the kinds of word usages that tend to create miscommunication (Garí Soler et al., 2025b). Relying on the collected WMNs as well as other relevant sources, such as suggested reformulations in instructional data (Anthonio et al., 2020), annotation disagreements revealing ambiguity (McCarthy et al., 2016), or scare quoted usages (Garí Soler et al., 2026), the thesis will seek to characterize lexical and contextual configurations that frequently lead to misunderstandings, with the goal of developing computational tools capable of identifying these usages, both to anticipate them in production and to trigger clarification when needed. This line of work will also investigate how speakers negotiate word meaning and how alignment emerges in conversation. Beyond the negotiation sequence itself, it will examine how the rest of the interaction reflects successful or unsuccessful alignment (Garí Soler et al., 2023).
The third direction concerns application to conversational AI. The insights and tools derived from data collection and analysis will be applied for the direct evaluation and improvement of conversational AI systems. The work will first aim to assess the capabilities of current language models and dialogue systems to handle and detect problematic word usages, respond to or produce clarification requests targeted to specific lexical items, and their ability to coordinate with a speaker after a lexical pact has been established. These experiments will reveal the weaker aspects of current systems and guide the development of models and tools that better emulate natural human communicative behavior in situations involving meaning unclarity.
Methodology
The thesis will combine corpus creation and annotation with computational modeling and analysis, using neural language models and machine learning techniques applied to conversational data, and modifying LLMs with techniques like fine-tuning.
Starting from existing annotated resources and proposed approaches for identifying WMN indicators, it will experiment by exploring active learning strategies (Cohn et al., 1994), a richer use of conversational context, and experiments across different conversational settings and registers. Manual and semi-automatic annotation of triggers and negotiation phases will be explored in parallel to enrich the available material. Work on the study and detection of problematic word usages will rely on language models, contextual word representations and anomaly detection methods, which have been shown to be able to detect or reflect multiple relevant linguistic phenomena such as polysemy degree, reading times or metaphoric uses (Garí Soler and Apidianaki, 2021; Goodkind and Bicknell, 2018; Bejan et al., 2023). Computational tools based on these methodologies will be developed, targeting the detection of different types of problems, such as lexical ambiguity, creativity, complexity or mistakes. The study of negotiation sequences and the success or evolution or lexico-semantic alignment throughout the conversation will be supported with existing or improved alignment measures (Garí Soler et al., 2023). An evaluation protocol will be designed to test how current large language models behave when faced with problematic word usages and clarification situations, using controlled prompting and conversational scenarios. In line with the thesis objectives of achieving human-like communication, the use of natural human data will be prioritized, as opposed to synthetically generated datasets that are becoming more and more pervasive (Ou et al., 2024); as well as the use of open-source models that are transparent about their training sets, to ensure a fair evaluation. The work will explore how trigger detection and lexico-semantic alignment modeling can be used to augment dialogue systems. This augmentation will rely on different post-training procedures, starting with supervised fine-tuning and different kinds of preference alignment, which have the potential to steer existing models to behave in specific or more personalized ways (Cao et al., 2024).
Supervising team
Thesis director: Chloé Clavel is a Senior Researcher at INRIA Paris. Her research contributes to the development of methods from artificial intelligence (computational models of socio-emotional behaviors by combining symbolic and machine learning approaches) and emotional computing (analysis and synthesis of socio-emotional signals). She currently works on interactions between humans and virtual agents, from the analysis of the user's socio-affective behavior (verbal and non-verbal) to socio-affective interaction strategies. She has participated in several European and national collaborative projects (e.g., H2020 ITN ANIMATAS, aria-valuspa EU-TIC) and is a member of the main program committee for various international conferences (e.g., IJCAI, AAMAS, ACII, ICMI). She will supervise the doctoral student for 40% of her time.
Co-supervisor: Aina Garí Soler is a Senior AI Fellow at Dauphine-PSL, PR[AI]RIE - PSAI. Her research is at the intersection between computational linguistics and artificial intelligence, with a particular focus on how language models represent word meaning in context. Her postdoc research explored precisely the question of lexico-semantic alignment, word meaning negotiation and problematic word usages, providing an important foundation for this thesis work. She will provide 60% of the supervision.
References
Aliannejadi, M., Zamani, H., Crestani, F., & Croft, W. B. (2019, July). Asking clarifying questions in open-domain information-seeking conversations. In Proceedings of the 42nd international acm sigir conference on research and development in information retrieval (pp. 475-484).
Anthonio, T., Bhat, I., & Roth, M.. wikiHowToImprove: A Resource and Analyses on Edits in Instructional Texts. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5721–5729, Marseille, France. European Language Resources Association.
Axelsson, A., Buschmeier, H., & Skantze, G.. Modeling feedback in interaction with conversational agents—a review. Frontiers in Computer Science, 4, 744574.
Bazzanella, C., & Damiano, R.. The interactional handling of misunderstanding in everyday conversations. Journal of Pragmatics, 31, 817-836.
Bejan M., Manolache A., & Popescu M.. AD-NLP: A Benchmark for Anomaly Detection in Natural Language Processing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10766–10778, Singapore. Association for Computational Linguistics.
Brennan, S. E., & Clark, H. H.. Conceptual pacts and lexical choice in conversation. Journal of experimental psychology: Learning, memory, and cognition, 22, 1482.
Cao, Y., Zhang, T., Cao, B., Yin, Z., Lin, L., Ma, F., & Chen, J.. Personalized steering of large language models: Versatile steering vectors through bi-directional preference optimization. Advances in Neural Information Processing Systems, 37, 49519-49551.
Cohn, D. Atlas, L. & Ladner, R.. Improving generalization with active learning. Machine learning, 15:201–221.
Gan, Y., Li, C., Xie, J., Wen, L., Purver, M., & Poesio, M.. ClarQ4LLM: A Benchmark for Models Clarifying and Requesting Information in Task-Oriented Dialog. IEEE Transactions on Audio, Speech and Language Processing.
Garí Soler, A., & Apidianaki, M.. Let’s play mono-poly: BERT can reveal words’ polysemy level and partitionability into senses. Transactions of the Association for Computational Linguistics, 9, 825-844.
Garí Soler, A., Labeau, M., & Clavel, C. (2025b). Potentially Problematic Word Usages and How to Detect Them: A Survey. In Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (* SEM 2025) (pp. 441-463).
Garí Soler, A., Labeau, M., & Clavel, C. (2025a). Toward the automatic detection of word meaning negotiation indicators in conversation. Findings of the Association for Computational Linguistics: EMNLP.
Garí Soler, A. G., Myrendal, J., Clavel, C., & Larsson, S. (2026a). The NeWMe Corpus: A gold standard corpus for the study of Word Meaning Negotiation. To appear in Language Resources and Evaluation
Garí Soler, Zevallos Huaco, J. C., A., Labeau, M., & Clavel, C. (2026b). Scare Quotes as Markers of “Questionable” Word Usages and Misalignment in Conversation: An Annotation Study. Accepted to the Fifteenth Language Resources and Evaluation Conference: LREC 2026, Palma de Mallorca, Spain, May 13-15.
Goodkind, A. & Bicknell, K.. Predictive power of word surprisal for reading times is a linear function of language model quality. In Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018), pages 10–18, Salt Lake City, Utah. Association for Computational Linguistics
Keyvan, K., & Huang, J. X.. How to approach ambiguous queries in conversational search: A survey of techniques, approaches, tools, and challenges. ACM Computing Surveys, 55, 1-40.
Liu, A., Wu, Z., Michael, J., Suhr, A., West, P., Koller, A., ... & Choi, Y.. We’re afraid language models aren’t modeling ambiguity. In Proceedings of the 2023 conference on empirical methods in natural language processing (pp. 790-807).
Madge, C., Purver, M., & Poesio, M. (2025, November). Referential ambiguity and clarification requests: comparing human and LLM behaviour. In Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference (pp. 1-11).
McCarthy, D. & Apidianaki, M. & Erk, K.. Word Sense Clustering and Clusterability. Computational Linguistics, 42:245–275
Ou, J., Lu, J., Liu, C., Tang, Y., Zhang, F., Zhang, D., & Gai, K. (2024, June). Dialogbench: Evaluating llms as human-like dialogue systems. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (pp. 6137-6170).
Purver, M., Ginzburg, J., & Healey, P.. On the means for clarification in dialogue. In Proceedings of the Second SIGdial Workshop on Discourse and Dialogue.
Schober, M. F., Suessbrick, A. L., & Conrad, F. G.. When do misunderstandings matter? Evidence from survey interviews about smoking. Topics in Cognitive Science, 10, 452-484.
Tang, A., Soulier, L., & Guigue, V.. Clarifying ambiguities: on the role of ambiguity types in prompting methods for clarification generation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 20-30).
Zhang, T., Qin, P., Deng, Y., Huang, C., Lei, W., Liu, J., ... & Chua, T. S.. CLAMBER: A benchmark of identifying and clarifying ambiguous information needs in large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 10746-10766).
Zou, J., Aliannejadi, M., Kanoulas, E., Pera, M. S., & Liu, Y.. Users meet clarifying questions: Toward a better understanding of user interactions for search clarification. ACM Transactions on Information Systems, 41, 1-25.
Avantages
1. Subsidized meals
2. Partial reimbursement of public transport costs
3. Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
4. Possibility of teleworking and flexible organization of working hours
5. Professional equipment available (videoconferencing, loan of computer equipment, etc.)
6. Social, cultural and sports events and activities
7. Access to vocational training
8. Social security coverage
En cliquant sur "JE DÉPOSE MON CV", vous acceptez nos CGU et déclarez avoir pris connaissance de la politique de protection des données du site jobijoba.com.