NAACL 2024 —Overview of key themes in space of Large Language Models (LLMs)

Ayush Kumar
8 min readAug 4, 2024

--

Photo by Neeqolah Creative Works on Unsplash

After attending EMNLP 2023 in Singapore, where many talked about Large Language Models (LLMs), I was eager to see what NAACL 2024 would bring. I wanted to see how researchers are exploring this area and finding new opportunities and useful ideas. At NAACL, I saw many different and interesting themes, showing how quickly academia adapts to new challenges and questions.

At EMNLP, many papers discussed the potential and limitations of LLMs in various applications. NAACL, however, focused on specific new areas to explore, showing that there are still many ways to improve LLMs. In this blog, I’ll share the main themes and my thoughts from NAACL.

Reasoning, Reflection and Common-Sense

After a year of excitement about LLMs, a big question is: “Can they reason and fix their mistakes?” At NAACL 2024, many researchers addressed this. One trend is using advanced learning techniques and specialized tools to improve mathematical and logical reasoning. Researchers have different opinions on the best ways to ensure LLMs can check and correct themselves. Some believe in using external tools, while others focus on improving the models themselves. There’s growing interest in using human-like thought processes to help models handle complex reasoning tasks. Results on reflective thinking and self-checking are mixed, suggesting a need for a balanced approach.

A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [Link]

Taxonoy of the fallacies as presented in the work (Image taken from the original research draft)

This study unveils the self-verification potential of LLMs, revealing critical insights into their ability to detect and correct logical errors, via analysis done on a dataset containing 232 types of reasoning fallacies. It highlights the current limitations of LLMs in self-assessment and proposes future directions for enhancing these capabilities.

GoT: Effective Graph-of-Thought Reasoning in Language Models [Link]

Pipeline illustration as taken from the work (Image taken from the original research draft)

GoT mimics human thought processes through graph modeling, enhancing LLMs’ ability to tackle complex reasoning tasks. By representing thoughts as interconnected nodes in a graph, this approach improves the model’s capacity for non-linear reasoning and multi-step problem-solving. The study showcases how graph-based reasoning can help LLMs navigate and solve problems that require understanding and manipulating multiple concepts simultaneously.

When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models [Link]

Key proposal on how to take advantage of “Self-Reflection” (Image taken from the original research draft)

This investigation into reflective thinking reveals mixed impacts on LLM performance, depending on task complexity. The study examines how self-reflection influences model accuracy and suggests that the benefits of reflective thinking may vary significantly across different reasoning tasks: benefits of self-reflection are mostly limited to cases in which the model’s initial responses are unreliable in accuracy, but with more persistent benefits for harder questions. By analyzing the role of reflective thinking, the research highlights the potential for self-assessment and error correction to enhance LLM performance.

Culture, Bias and LLMs

As LLMs become increasingly integral to our digital ecosystem, understanding and mitigating cultural biases becomes paramount. The challenge lies in accurately capturing the subtle intricacies of various cultures while reducing biases that may arise from the training data. Efforts are also focused on understanding how cultural contexts influence model behavior and developing strategies to address these influences. The goal is to make AI more equitable and sensitive to the cultural diversity of its users, ensuring that models serve a broader and more inclusive audience effectively. There is also significant interest in evaluating models’ cultural awareness across different languages, understanding the capabilities and limitations of LLMs in handling cultural commonsense and sensitivity is crucial.

Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings [Link]

Authors collect proverbs from six languages (top) and their usage within conversational contexts. Authors evaluate mLLMs with a binary-choice inference task in the conversational context that contains proverbs (bottom). (Image taken from the original research draft)

The work study the ability of a wide range of state-of-the-art multilingual LLMs (mLLMs) to reason with proverbs and sayings in a conversational context. The experiments reveal that: (1) mLLMs “know” limited proverbs and memorizing proverbs does not mean understanding them within a conversational context; (2) mLLMs struggle to reason with figurative proverbs and sayings, and when asked to select the wrong answer (instead of asking it to select the correct answer); and (3) there is a “culture gap” in mLLMs when reasoning about proverbs and sayings translated from other languages.

Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense [Link]

Examples illustrating LLMs’ capabilities and limitations on cultural commonsense. (Image taken from the original research draft)

Findings over several general and cultural commonsense benchmarks, reveal that (1) LLMs have a large performance gap for different cultures when tested on culture-specific commonsense knowledge.; (2) LLMs erroneously associate general commonsense with a few dominant cultures; and (3) The language used to prompt LLMs can significantly affect their cultural commonsense understanding.

IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context [Link]

IndiBias introduces a benchmarking dataset specifically designed for evaluating social biases in the Indian context, encompassing various societal dimensions such as gender, religion, caste, age, region, physical appearance, and occupation.

Human-AI Interaction

The advancement of Large Language Models (LLMs) has dramatically enhanced the abilities of AI systems. These models can understand and produce text that closely resembles human communication, allowing them to participate in complex conversations and situations, create various types of data and contents, and perform tasks previously thought to be exclusive to humans. Consequently, this progress is significantly changing our interactions with technology and each other, an area known as “Human-AI Interaction” that has been studied for more than a decade.

There conference has 3-hour long tutorial on apprising the long-drawn human-computer interaction and the paradigm change that this space is going along the lines of Types of human-AI interaction and design thinking and typical evaluation metrics that represent the success of human-AI interactions.

Attacks, Security, Harms, Safety and Threats in LLMs

As LLMs permeate various facets of technology, the threat landscape expands, highlighting the critical need for robust defenses against adversarial attacks. While some researchers emphasize the importance of robust detection algorithms, others focus on improving the inherent robustness of the models themselves. The discourse also highlights the need for a balanced approach that combines defensive measures with proactive strategies to anticipate and counteract new types of attacks.

From Shortcuts to Triggers: Backdoor Defense with Denoised PoE (Link)

Denoised PoE: Image as taken from the work, to point to the diagram of the pipeline.

In the pretext that data poisoning can be an attack, existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers, leaving a universal defense against various backdoor attacks with diverse triggers largely unexplored. In this work, authors propose Denoised Product of Expert (DPoE), an end-to-end ensemble-based backdoor defense method that mitigates backdoor triggers by learning the backdoorfree residual of a shallow model that captures the backdoor shortcuts

Very interestingly, this time there were a number of works on Backdoor Attacks as mentioned below:

Illustration of backdoor with trigger. Image as taken from the work: https://arxiv.org/pdf/2305.14910
  • Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors (Link) — Pretext: Existing defense mechanisms often assume that only one type of trigger is adopted by the attacker, while defending against multiple simultaneous and independent trigger types necessitates general defense frameworks and is relatively unexplored.
  • ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger (Link) — The authors present BGMAttack, a backdoor attack framework utilizing various black-box generative models as implicit triggers. Extensive experiments demonstrate that the decoder-only generative model, ChatGPT, outperforms other baseline models. Significantly, BGMAttack achieves state-of-the-art attack effectiveness across four different datasets, producing stealthier poisoned samples characterized by lower sentence perplexity, fewer grammatical errors, higher grammar acceptance, and better semantic maintenance. Moreover, BGMAttack shows resilience against GPT-based detection techniques and maintains robustness against three defense strategies.
  • Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models (Link) — This study indicates that an attacker can successfully inject backdoors by issuing a minimal number of malicious instructions (approximately 1000 tokens) and manipulate model behavior via data poisoning, without the necessity of altering data instances or labels. Such instruction attacks can achieve an attack success rate exceeding 90%. Through the use of instruction attacks, poison attacks that modify instruction while leaving data instances intact, the attacker is able to achieve a high attack success rate compared to other attacks.
  • PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning (Link) — PromptFix is a backdoor mitigation strategy for NLP models, employing adversarial prompt-tuning in few-shot settings. Unlike traditional methods that depend on precise trigger inversion and subsequent model fine-tuning, PromptFix maintains the integrity of model parameters and utilizes two additional sets of soft tokens. These tokens approximate and counteract the trigger, respectively. By leveraging soft tokens and adversarial optimization, PromptFix avoids the need to exhaustively enumerate possible backdoor configurations, enabling an adaptive balance between trigger identification and performance preservation. PromptFix represents the first use of prompt-tuning for backdoor removal and is specifically designed for few-shot tuning. The use of soft tokens eliminates the necessity of fixed trigger injection methods, allowing the approach to automatically adapt to various trigger types without manual specification.

Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs (Link)

Sandwich attack: Image as taken from the work

This work shows that in multilingual LLMs, adversaries can exploit the imbalanced representation of low-resource languages in datasets used for pretraining and safety training. Authors introduce a new black-box attack vector called the Sandwich Attack: a multi-language mixture attack, which manipulates state-of-the art LLMs into generating harmful and misaligned responses. Their experiments with five different models, namely Bard, Gemini Pro, LLaMA-2–70-B-Chat, GPT-3.5-Turbo, GPT-4, and Claude-3-OPUS, show that this attack vector can be used by adversaries to elicit harmful responses from these models

Multimodal Models

With such advanced possibilities demonstrated GPT-4o and Claude displaying their multimodal capabilities, I wasn’t particularly wowed by the section of multimodal works. However, if I timecheck the works to be done over last year or so (when NAACL submissions happened), a number of these works bring up multimodal architetures for application specific needs such as emotion analysis, visual-question answering, meme analysis (a new domain that is catching up these days). This section may also be a break for anyone interested to read something non-LLM :D

I hope this blog gave you a cursory look of some of the prominent themes and some of the associated works. I can be connected at <LinkedIn>.

--

--

Ayush Kumar

Machine Learning Scientist | Traveled places in 7 countries | Applied Researcher | IIT Patna Alumnus | Technical Writing