Dan’s Weekly AI Speech and Language Scoop #6

These notes are written for a Meta-internal audience. I quickly redact Meta-internal information before publishing but don’t take the time to rewrite. If a particular edition seems thin or a transition doesn’t make sense, this is why.

Vibes-based evals: Why Jeff Bezos is a fan and we should be, too

We’ve talked a lot about the challenges of evaluating models that operate in an extremely high-dimensional space a lot in this newsletter [1, 2, 3, 4]. This sentiment is not unique. The chief scientist at MosaicML (famous for both releasing a strong, open foundation LLM ahead of Llama2 and more recently the SOTA open MoE model) introduced the concept of “vibes-based evals” to complement potentially misleading quantitative experiments. Jeff Bezos extends this intuition to many business problems: “when the data and the anecdotes disagree, the anecdotes are usually right”.

LLMs are overparameterized

We’ve seen some really interesting work hit the press regarding quantizing LLMs all the way down to 1-bit with minimal impact on performance. Meta’s own Andrey Gromov and Kushal Tirumal just shared a hilariously titled pre-preprint, “The Unreasonable Ineffectiveness of the Deeper Layers”, that shows simply deleting up to ~40% of layers barely impacts model performance after a quick fine tune.

Finally a victory for men in the battle versus machines

One theme that I find strongly interesting is that machines now usually outperform humans at many tasks for which we have historically used annotators. However, a group of physicists recently pitted a group of students against both GPT-3.5 and GPT-4 in an undergraduate coding lab. The students beat the best GPT-4 configuration by a wide margin. Perhaps our jobs are safe after all…

MetaAI narrowly leads ChatGPT in NCAA bracket predictions

I love novel consumer uses of LLMs and got a kick out of this one. Through a series of conversations, Jake Bruene got NCAA bracket predictions out of ChatGPT, MetaAI, and Gemini (MetaAI currently in the lead!).

I could imagine this type of use-case really resonating with users. Imagine that your are the one person in your friend group who doesn’t know a thing about NCAA basketball and getting real-time help in filling out your bracket, explanations of various events happening mid game, and so on without leaving the moment.

Jamba dethrones Mixtral

I don’t know a thing about state-space models but want to follow along to try to develop an understanding. AI21 labs just released an open SSM called Jamba that seems to outperform a bunch of peers.

Their website indicates that the Jamba architecture reduces memory footprint and increases throughput over transformers, so perhaps something like this could help serve more powerful models on-device.