In the world of machine learning, Large Language Models (LLMs) like LLAMA and OPT have quickly established themselves as powerful tools for a wide range of tasks. However, training these models is often a daunting task due to their high computational requirements. That's where LLMTune steps in, a new research project developed by Cornell Tech and Cornell University that allows you to fine-tune these colossal models on as little as one consumer-grade GPU.
LLMTune is a unique platform that simplifies the task of finetuning LLMs. It has been specifically designed to support a wide range of consumer-grade NVidia GPUs. With LLMTune, you can now finetune large models such as the 65 billion parameter LLAMA models on a single A6000 GPU.
This means that even those with relatively modest computational resources can benefit from the capabilities of these large models. LLMTune provides an easily navigable codebase and modular support for multiple LLMs, currently LLAMA and OPT.
How it works
LLMTune harnesses the power of the LoRA algorithm to finetune LLMs, which are compressed using the GPTQ algorithm. This requires the implementation of a backward pass for the quantized LLM. This makes it possible to leverage data parallelism for large models.
Goals of LLMTune
The primary goal of LLMTune is to provide a user-friendly platform that encourages creativity and experimentation with large language models. It aims to facilitate research on LLM alignment, bias mitigation, efficient inference, and other related areas.
A Quick Demo
To illustrate the power and flexibility of LLMTune, here is a simple example of running an instruction finetuned LLAMA-65B model on an NVidia A6000:
bashCopy code $ llmtune generate --model llama-65b-4bit --weights llama65b-4bit.pt --adapter alpaca-lora-65b-4bit --prompt "Write a well-thought out abstract for a machine learning paper that proves that 42 is the optimal seed for training neural networks."
The result is an intelligently crafted abstract for a hypothetical machine learning paper.
Installation and Usage
LLMTune requires a UNIX environment supporting Python (3.8 or higher) and PyTorch (version 1.13.1+cu116). It is also compatible with NVIDIA GPUs of the Pascal architecture or newer.
Installation is straightforward with conda or pip, and once installed, LLMTune comes with a command for easy access. Various pre-quantized models are available for download, and you can fine-tune these models to fit your specific needs. Models can be controlled programmatically using Python, allowing for maximum flexibility.
What GPUs are Supported?
Depending on the model size, LLMTune works with a range of NVIDIA GPUs. For instance, a 7b-4bit model would require a minimum of 6GB GPU memory, compatible with RTX 2060, 3050, and 3060. In contrast, a larger 65b-4bit model would necessitate at least 40GB GPU memory, compatible with an A100, 2x3090, 2x4090, A40, or A6000.
In the expanding universe of machine learning, LLMTune is a game-changer. It makes the finetuning of large language models accessible to those who may not have the high computational resources typically required. Whether you are a researcher delving into the intricacies of LLM alignment or a creative wanting to experiment with language models, LLMTune could well be