Meta takes on Google with this open source AI podcast generator

2 hours ago

0 143 2 minutes read

Meta takes on Google with this open source AI podcast generator

Meta on Sunday released a new open-source artificial intelligence (AI) tool that takes on Google NotebookLM. The tool, called NotebookLlama, is an AI-powered podcast generator where users can upload a PDF file and the tool will turn it into an audio podcast with two AI characters. The tool uses three different Llama 3.1 AI models to complete the entire process. Like Google’s tool, NotebookLlama’s podcast also follows a back-and-forth conversation between two AI hosts in a free-flowing manner.

The Meta NotebookLlama AI tool uses three major language models to generate audio podcasts from blocks of text. Currently, the tool only accepts PDF files as input, so users will have to convert whatever text format they have to PDF.

Meta NotebookLlama workflow
Photo credit: Meta

NotebookLlama first uses the Llama 3.2 1B instruction model to pre-process the PDF file and save it to a ‘.txt’ file. Then the Llama 3.1 70B instruction model is used to write a podcast transcript using the source dataset. The transcription is then dramatized using a rewriter using the Llama 3.1 8B instructional model. Finally, a custom tool is used to add the transcription to a text-to-speech workflow. Meta uses the Parler TTS tool for this. Interested individuals can access all the models needed to generate podcasts from the GitHub listing here.

However, the AI models mentioned above are just recommendations from the developers. Users may prefer to use smaller models for each step, but results may vary. Meta emphasized that to run the AI system in the recommended configuration, users will need a GPU with a total memory of around 140 GB.

An X user (formerly known as Twitter). posted an example of the generated podcast. Based on this, it seems that the audio quality is not as good as Google NotebookLM, and it sounds shrill and robotic. Furthermore, there are instances where parts of the audio are skipped and the AI hosts end up talking over each other.

Meta acknowledges some of the issues and plans to improve them in the next version of the AI product. The company emphasized: “The TTS model is the limitation of how natural this will sound. This can probably be improved with a better pipeline and the help of someone more knowledgeable.”

The tech giant also plans to use two different LLMs to write the script, with each model debating each other to make the podcast sound more conversational. This is also part of the developers’ future pipeline. In addition, the company is also testing the Llama 405B AI model to write the transcriptions and increasing support for more input and output formats.

For the latest tech news and reviews, follow Gadgets 360 X, Facebook, WhatsApp, Wires And Google News. For the latest videos on gadgets and technology, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who is that360 on Instagram And YouTube.