If you thought GPT-3 is massively impressive, and we all did, wait till you find out about the machine that is capable of beating OpenAI’s infamous text generator. And it does that using only a tiny fraction of GPT’s parameters.
In case you need a refresher, GPT-3 is a text generator that is able to take in any prompt as an input and talk about in a very human-like manner. In fact, it’s hard to distinguish between something written by GPT-3 and a similar human composition, like this essay right here. We also did a Deep Dive on it, which you can find over here.
What you must know about GPT-3 is that it uses a ton of parameters and the entire Internet as its training data (the whole English Wikipedia itself comprises less than 1% of its training data), so it’s little wonder that it’s so good at what it does.
However, AI researchers from the Ludwig Maximilian University (LMU) of Munich believed they could go one step further than this. In a classic case of David vs. Goliath, they developed a relatively small and lean text generator that uses 99.9% fewer parameters than GPT-3 and yet is still able to outperform it.
According to a recent on pre-print paper arXiv, the duo’s system outperforms GPT-3 on the “superGLUE” benchmark test with only 223 million parameters:
“In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain some form of task description, combined with gradient-based optimization; additionally exploiting unlabeled data gives further improvements.”
Parameters are variables used to tune and tweak AI models. In essence, the more parameters an AI model is trained with, the more robust and high-performing we expect it to be. So, when a system comes along to outperform GPT-3 on a benchmark test by using 99.9% fewer parameters, you best believe that it’s a pretty big deal!
What’s special about this lean but mean LMU system? It uses an innovative technique called pattern-exploiting training (PET) and combines it with a small pre-trained model.
In the weekly ImportAI newsletter, OpenAI policy director Jack Clark explains: “Their [LMU] approach fuses a training technique called PET (pattern-exploiting training) with a small pre-trained Albert model, letting them create a system that “outperform GPT-3 on SuperGLUE with 32 training examples, while requiring only 0.1% of its parameters.”
Your move, GPT-3.