Dan’s Weekly AI Speech and Language Scoop #18

These notes are written for a Meta-internal audience. I quickly redact Meta-internal information before publishing but don’t take the time to rewrite. If a particular edition seems thin or a transition doesn’t make sense, this is why.

Distilling prompts from a closed model is all you need

The Llama2 fine-tunes from Meta were not particularly impressive. Shortly after launching, Llama2-70B-chat dropped below a number of community contributions based on all kinds of alchemy. Llama3 is another story. The Meta fine-tune, Llama3-70B-instruct debuted strong in the Chatbot Arena and has held up well, slipping only a handful of spots as competitors have iterated their models.

This week, Nous Research released the first community fine-tune that beats Llama3-70B-instruct, Hermes 2 Theta 70B. Two things are notable about this launch:

  1. They blatantly disregard the terms in the Llama3 license that requires fine-tunes to maintain the Llama name
  2. The Hermes team behind this is known for their GPT-4 assisted fine-tunes of open models. They almost certainly used the OpenHermes dataset to generate this fine-tune. Interestingly, their previous fine-tunes of Llama2 and Mistral significantly outperformed the ones provided by the original authors that presumably respected the OpenAI license, but the gap has closed substantially. This indicates both that Meta is getting better at alignment and the edge in distilling GPT-4 down into other models is shrinking, which is not surprising given that performance seems to be asymptoting across all companies.

Former Meta FAIR researchers launch EvolutionaryScale

For those who haven’t taken molecular biology, life is little more than a sequence of tokens. Everything about us and every living thing on this planet is defined by a sequence of DNA bases strung together that code for a sequence of amino acids that fold up in the stuff that makes us us. This process is much more complex than was first hypothesized and is still taught in high-school textbooks, but the general thrust is still true.

DeepMind’s AlphaFold and its associated pharmaceutical discovery lab, Isomorphic, have made strong progress in applying the lessons of LLMs to biology, but I am excited to see a new entrant in the space. EvolutionaryScale released an open foundation model for biology that they have already used to generate a novel green fluorescent protein. I’m looking forward to downloading their model and seeing what I can learn.

Microsoft announces human-quality zero-shot TTS

Microsoft showed that they can clone a voice at human quality with a single example (I don’t know why they call this zero-shot instead of one-shot; can someone explain this to me?). Their examples are really impressive. I don’t think that I could reliably separate the synthetic from genuine voices either. Say goodbye to voice authentication…