Instruction Tuning: EMNLP 2023 Overview Part 1
What is Instruction Tuning?
Instruction Tuning, in the context of Large Language Models (LLMs), refers to a specialized training method designed to enhance the model’s ability to understand and respond to user instructions or commands more accurately and effectively. This process involves fine-tuning the model on a dataset that comprises various instruction-input pairs and their corresponding desired outputs or actions.
At the recent EMNLP 2023 conference, a major theme that captured significant attention was the enhancement of instruction-tuned Large Language Models (LLMs) via: 1) Enhancing the Quality and Quantity of Instruction Data, and 2) Strategic Selection and Enhancement of Training Data.
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks
Problem Statement: This approach identifies informative tasks that can actively tune models, thereby improving their generalization capabilities across various tasks.
Approach Idea:
- Focuses on the strategic selection of tasks for instruction tuning based on prompt uncertainty.
- The approach involves actively tuning the instruction-following models by leveraging a combination of supervised learning and reinforcement learning techniques.
- The process includes iteratively updating the model based on feedback from both correct and incorrect responses to instructions.
Key Result: The quantified results show a significant improvement in instruction-following accuracy. The paper reports specific metrics (like accuracy percentage or improvement over baseline) demonstrating the efficacy of the approach.
DYNOSAUR: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation
Problem Statement: This paper addresses the issue of efficiently curating instruction-tuning data for LLMs. The specific problem is the manual and costly effort required in traditional data curation methods and the need for a more dynamic and automated approach.
Approach Idea: — Dynosaur automates the curation of instruction-tuning data for large language models (LLMs).
- It leverages the metadata from existing datasets to generate relevant data fields and corresponding instructions.
- The process involves using LLMs (like GPT-3.5-turbo) to synthesize instructions based on dataset descriptions, data fields (like title, text, author), and annotations.
- It employs a dual approach: description-aware generation (considering dataset descriptions) and description-unaware generation (focusing solely on data fields and annotations), to generate a variety of instruction-tuning tasks.
- Post-processing includes filtering invalid tasks, organizing instruction data, and adding label spaces for classification tasks.
Key Result:
- Cost Efficiency: Dynosaur reduced the cost of generating instruction-tuning data significantly. For example, generating 800K instruction-tuning samples cost less than $12 USD using GPT-3.5-turbo, compared to around $500 USD using other methods like ALPACA and INSTRUCTION GPT-4 for a smaller dataset of 52K instances.
- Performance Improvement: Models trained with Dynosaur data showed substantial improvements. For instance, training T5–3B with Dynosaur resulted in a 2.5–22 ROUGE-L score improvement and 2.8–12.8 METEOR score improvement over other datasets on different benchmarks like SUPER-NI and LONGFORM.