Key developments in AI are happening at such a frenetic pace that it can be difficult to keep oneself updated. And we are not even talking about the plethora of commercial AI-powered services and tools that have sprung up in the wake of the impressive new research.
Dovydas Ceiltuka over at Turing College came up with a handy list of five “most important” developments in AI, some of which we have already covered, such as Stable Diffusion by Stability AI and Make-A-Video by Meta AI.
We highlight three other cutting-edge AI developments mentioned by Ceiltuka below.
Minerva by Google
It turns out that language models in general are terrible at math problems. However, Minerva is a language model that can solve math and science problems of high school-level difficulty, while also providing the reasoning for the answer.
But first, why do language models struggle with math? The difficulty hinges on the combination of skills required, including correctly parsing a question that often consists of natural language and mathematical notation, correctly applying relevant formulas and constants, and generating the final solution one step at a time.
“Minerva combines several techniques, including few-shot prompting, chain of thought or scratchpad prompting, and majority voting, to achieve state-of-the-art performance on STEM reasoning tasks,” wrote the Google researchers behind Minerva.
According to the researchers, Minerva builds on the Pathways Language Model (PaLM), itself a massive 540-billion parameter, dense decoder-only Transformer model we wrote about earlier this year.
To promote quantitative reasoning, Minerva was given further training on a 118GB dataset of scientific papers from the arXiv preprint server and web pages containing mathematical expressions using LaTeX, MathJax, or other mathematical typesetting formats.
Under the hood, Minerva also generates multiple solutions to each question before choosing the most common answer as the solution to significantly improve its performance.
Wish you had Minerva during your school days, don’t you? You can read its introduction on the Google Research blog here.
No Language Left Behind by Meta AI
No Language Left Behind (NLLB) is Meta AI’s take on language translation to deliver high-quality translation between 200 languages. That doesn’t sound like a lot until you realize that even the current juggernaut in the room, Google Translate, supports a mere 133 –and that is after adding 24 earlier this year.
What makes NLLB especially impressive is its support for “low-resource languages”, which are languages that lack the training data traditionally required to build accurate state-of-the-art natural language processing systems.
You see, while there are more than 7,000 languages spoken by people across the world, it turns out that a mere 20 – yes, twenty – have a text corpus of hundreds of millions of words. To be clear, NLLB was partially built with the aid of human translators.
Ceiltuka praised the performance of NLLB, noting: “In terms of performance, the NLLB outperforms Google Translate in some respects and is also better on average…. This is quite the feat.”
What’s more, the model is open source, too. You can learn more about NLLB here.
Whisper by Open AI
Speech recognition is hardly new. In this case, Whisper is an unsupervised speech-recognition model that has proven competitive with existing state-of-the-art models trained using labeled datasets.
Data labeling is the process of working with raw data to add one or more “labels” to provide context for the machine learning model to pick up. This is incredibly time-consuming, but part and parcel of supervised machine learning.
The fact that Whisper doesn’t require it is a significant achievement. Behind the scenes, Whisper was trained on some 680,000 hours of multilingual and multitask supervised data.
The other strength of Whisper is its robustness in terms of its ability to decipher conversations with noisy datasets. Ceiltuka notes that Whisper does remarkably well even when dealing with noisy recordings, such as people talking over the phone or at home.
Whisper supports transcription in multiple languages, though not as well as in English. For speech recognition in English, Whisper approaches “human-level robustness and accuracy.”
You can learn more about Whisper here.
The field of AI continues to advance rapidly. How will AI impact our professional lives? I have some thoughts about that, which I plan to share next week. Stay tuned!
Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​
Image credit: iStockphoto/Eightshot Studio