Maximum sequence length (from 4k to 8k)
Eguana's average performance across 4 academic benchmarks
“Partnering with Multimodal allowed us to put together the next-generation barrier-breaking system we needed to overcome the obstacles we faced and start growing in a sustainable way.”
Using a generic LLM (GPT-3):
It also proved to be too inaccurate for academia.
Caktus is an edtech company providing a cutting-edge AI academic tool that helps students study, write papers, and prepare job applications. When it first launched, it was powered by OpenAI’s GPT-3.
Caktus wanted to replace GPT-3 with a proprietary Generative AI model specialized for academia. They partnered with Multimodal, and we developed the initial versions of the model in just 3 months.
Today, Eguana surpasses all major large language models (LLMs) on academic benchmarks — and is currently the world’s best LLM for academia.
Caktus’ founders, Tao Zhang and Harrison Leonard, first launched their GPT-3 powered academic tool in March 2022. Its goal was to help students solve academic tasks more efficiently using AI.
However, the pair soon realized that using an off-the-shelf model was hindering their progress. GPT-3 was the main bottleneck, and building a proprietary Generative AI model became Caktus’ main priority. In December 2022, they partnered with Multimodal to achieve that goal.
Caktus wanted to overcome the challenges that came with using off-the-shelf models developed and, essentially, owned by other companies. The three main challenges were as follows:
Some of Caktus’ features are still similar to ChatGPT. Considering it also used similar technology, Caktus desperately needed a differentiation point to attract investors and customers.
It became clear that Caktus needed a Generative AI model they could evolve themselves and use without significant recurring expenses. This made them turn to Multimodal and start building Eguana — currently the world’s best academic large language model (LLM).
“Number one, we want to build a scalable system. OpenAI is a good starting point, but when you're getting millions of users, that adds up to a pretty big sum very quickly. So, we wanted to really figure out a system where we can optimize usage. And Eguana seemed like the perfect solution.” - Tao Zhang
Caktus’ goals were twofold:
To preserve the resources, we decided against building a model from scratch and opted for fine-tuning a generic LLM on academic tasks and using academic data. This allowed us to directly target the biggest weakness of generalist models — poor performance on specialist tasks, caused by all-purpose training on unvetted web data.
We aimed to mitigate these issues by improving upon Llama 2 (7B) in partnership with Cerebras and CORE. LLaMa-2 was primarily chosen because of its accuracy and relatively small size; Cerebras acted as our hardware partner; CORE was our dataset partner.
Llama-2 (7B) is a Meta’s open-source model. It performs on par with GPT-3.5 on various academic benchmarks and achieves a lower violation percentage compared to its competitors, all while being significantly smaller. Its relatively small size is one of its biggest assets, as it requires less computational resources, enables faster inference, and decreases operational costs.
Llama-2 (7B) Architecture
Preparing CORE academic papers for model training required complex data engineering from our end. Here’s exactly what was done to turn it into input data.
Turning CORE research papers into model training data
1. Adjusting the data
2. Extracting relevant data
3. Performing OCR
4. Structuring the data
5. Cleaning the data
6. Normalizing the data
For pre-training purposes, we used Cerebras’ hardware and a total of 5 billion tokens from CORE academic papers. We performed continual pre-training, i.e., pre-trained the model on a continuous stream of data over time, rather than pre-training it once on a static dataset.
This increased the model’s accuracy, enhanced its real-time responsiveness to changing data, and improved its learning abilities.
Further, we prepared an elegant infrastructure of GPUs from various providers, including Google Cloud Platform, CoreWeave, and Modal Labs, for model fine-tuning. We used thousands of well-curated instructions to fine-tune the model on several downstream tasks, namely short-form and long-form essay generation with citations and chat interaction.
Finally, we enhanced the model with additional capabilities that provide a better and more personalized user experience. For example, the chat model can access and leverage chat history and user-uploaded course materials to provide more accurate outputs.
The new base model we built for Caktus, Eguana, was significantly improved over Llama 2 (7B), with one of its main upgrades being the increased Maximum Sequence Length (MSL). Compared to Llama 2 (7B), Eguana can handle two times longer sequences, which allows students to ask more detailed questions and receive more in-depth answers through a chat-like interface.
Eguana also showed significantly better performance on the four most important metrics for academia: relevance, clarity, structure, and flow. It surpassed not only Llama 2 (7B), but also all other major LLMs released at the time, including GPT-3.5 Turbo, GPT-4, and Vicuna 1.5 (13B).
Eguana’s average score on all four benchmarks is 89.73, significantly better than all competitor LLMs we tested.
Further fine-tuning resulted in three models, each specializing in a specific downstream task and achieving greater accuracy.
As Zhang says, Eguana helped Caktus.ai become a premium product, providing the “much-needed tech backbone” to what used to be an essentially GPT-3 wrapper app. Caktus.ai was now able to offer users a higher quality experience and better outputs. This, in turn, attracted investors, kickstarted the switch to a paid business model, and increased user retention.
Our joint efforts also allowed Caktus to expand its product offerings.
According to Zhang, these results are slowly but surely helping Caktus transition to a technology company. Simultaneously, they’re also enabling even non-experts to quickly build reliable AI apps for academia. “That’s easy to do when you have an excellent tech foundation, and we finally really have that,” concludes Zhang.
“Over the period of such a short time, Multimodal delivered something that is traditionally impossible for a software company of any scale. Plus, they have the know-how needed to build AI that outperforms even major industry players. That's why we intend to keep building together.” - Tao Zhang