arxiv:2304.14402

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Published on Apr 27, 2023

Upvote

Authors:

Chiyu Zhang ,

Muhammad Abdul-Mageed ,

Alham Fikri Aji

Abstract

Leveraging distilled knowledge from large instruction-tuned LLMs, LaMini-LM achieves competitive performance on NLP benchmarks using a fraction of the resources.

AI-generated summary

Large language models (LLMs) with instruction finetuning demonstrate superior generative capabilities. However, these models are resource intensive. To alleviate this issue, we explore distilling knowledge from instruction-tuned LLMs to much smaller ones. To this end, we carefully develop a large set of 2.58M instructions based on both existing and newly-generated instructions. In addition to being sizeable, we design our instructions to cover a broad set of topics to ensure. A thorough investigation of our instruction data demonstrate their diversity, and we generate responses for these instructions using gpt-3.5-turbo. We then exploit the instructions to tune a host of models, dubbed LaMini-LM, of varying sizes, both from the encoder-decoder as well as the decoder-only families. We evaluate our models both automatically (on 15 different NLP benchmarks) and manually. Results show that our proposed LaMini-LM are on par with competitive baselines while being nearly 10 times smaller in size.