This new AI voice assistant beat OpenAI to one of ChatGPT’s most anticipated features

July 5, 2024

0 196 2 minutes read

This new AI voice assistant beat OpenAI to one of ChatGPT’s most anticipated features

OpenAI’s delay of ChatGPT’s impressive Voice Mode angered many fans of the AI chatbot, but they may now be hooked. French artificial intelligence developer Kyutai has introduced a real-time voice AI assistant called Moshi.

Moshi is designed to provide lifelike conversations with users via voice, like Alexa or Google Assistant, but is powered by the large language models that underpin ChatGPT and its rivals, in this case the Helium 7B model. According to Kyutai, Moshi can speak in different accents and has 70 different emotional and speaking styles. The AI can even process two audio streams simultaneously, allowing Moshi to listen and speak at the same time.

Kyutai’s development of Moshi involved fine-tuning over 100,000 synthetic dialogues created using Text-to-Speech (TTS) technology. The goal was to teach Moshi the nuances and tones of human communication. The brand even worked with a professional voice actor to improve Moshi’s voice quality.

This AI assistant integrates both text and audio training, optimized for multiple backends, meaning it can run on devices like laptops without the need to interact with the cloud. The company pitches this as a way to maintain privacy and security by preventing the transfer of sensitive data over the internet. You can watch a demo of Moshi here.

Open conversation

Kyutai announced that Moshi will be an open-source project, including the model’s codes and framework, providing a foundation for further innovation. The open-source approach could also help mitigate complaints that larger AI companies face regarding safety and ethics around their closed models. Kyutai’s backers, including French billionaire Xavier Niel, are pushing the open-source approach.

Kyutai is also working on AI audio identification, watermarking, and signature tracking systems to be incorporated into Moshi. These features will help identify AI-generated audio, promote accountability and traceability, while also ensuring that AI-generated content can be monitored and verified.

Moshi is still in development, but the voice mode in the presentation is impressive. The voice approach could serve as a catalyst for other voice-activated versions of ChatGPT rivals, or accelerate the addition of LLMs to Alexa and other voice assistants, should Moshi catch on and become popular.

If you want to try Moshi, a demonstration is available online. There you can also sign up for early access to the full chatbot.