FreeWilly: An Exploration of the Large and Powerful Instruction Fine-Tuned Models
Are you curious about the latest advancements in the realm of Large Language Models (LLMs)? Here at Stability AI and its CarperAI lab, a revolution is underway with the introduction of FreeWilly1 and its successor, FreeWilly2. Both models display remarkable reasoning skills and are paving the way for significant developments in the field of LLMs.
How Do FreeWilly Models Stand Out in the LLM Landscape?
FreeWilly1 and FreeWilly2 have been meticulously crafted to exhibit superior performance across varied benchmarks. But what makes them so special? FreeWilly1, the inaugural model, leverages the formidable base of the original LLaMA 65B model and is fine-tuned with a novel synthetically-generated dataset using Supervised Fine-Tune (SFT) in a standard Alpaca format. FreeWilly2 takes it a step further by utilizing the LLaMA 2 70B foundational model, resulting in a performance that gives stiff competition to GPT-3.5 in several tasks.
Both models stand as symbols of progressive research and are available under a non-commercial license, encouraging an open research environment. Despite rigorous internal red-teaming to ensure the models' politeness and harmlessness, feedback and further red-teaming assistance from the community are always welcome.
What Unique Approach Was Taken in the Data Generation and Collection for FreeWilly Models?

The training methodology employed for the FreeWilly models is inspired by Microsoft's innovative technique outlined in their paper, "Orca: Progressive Learning from Complex Explanation Traces of GPT-4." Our approach aligns with theirs, with a significant difference in data sources.

Stability AI generated a unique dataset, amassing 600,000 data points, approximately 10% of the dataset size used in the original Orca paper. High-quality instructions were prompted using language models from the following datasets:
COT Submix Original
NIV2 Submix Original
FLAN 2021 Submix Original
T0 Submix Original

This approach yielded 500,000 examples with a relatively simpler LLM model and another 100,000 examples with a more sophisticated LLM model. To ensure unbiased comparisons, datasets were carefully curated, and instances originating from evaluation benchmarks were excluded. Despite training with a significantly smaller sample size, the FreeWilly models showcased remarkable performance across various benchmarks, underscoring the validity of synthetically generated datasets.
How Do the FreeWilly Models Perform?
To internally assess these models, we used EleutherAI’s lm-eval-harness and AGIEval. Both FreeWilly models excelled in numerous areas, including intricate reasoning, understanding linguistic subtleties, and addressing complex questions in specialized domains like law and mathematical problem-solving.
The results obtained from the Open LLM Leaderboard benchmarks, GPT4ALL benchmarks, and AGI Eval indicate that the FreeWilly models are a significant leap forward in the world of open access Large Language Models.
How Do FreeWilly Models Contribute to an Open Future?
FreeWilly1 and FreeWilly2 are more than just models; they set new standards in the field of open access Large Language Models. These models not only propel research but also enrich natural language understanding and facilitate complex tasks. We are thrilled about the limitless possibilities these models will present to the AI community and the innovative applications they will inspire.
We extend our heartfelt gratitude to our dedicated team of researchers, engineers, and collaborators, whose relentless efforts and commitment have made this significant achievement possible.
Stay tuned for more thrilling developments and begin exploring the extraordinary potential of FreeWilly today! There's a world of AI to uncover, and the journey has only just begun.