Meta takes on Google with this open source AI podcast generator
Meta on Sunday released a new open-source artificial intelligence (AI) tool that takes on Google NotebookLM. The tool, called NotebookLlama, is an AI-powered podcast generator where users can upload a PDF file and the tool will turn it into an audio podcast with two AI characters. The tool uses three different Llama 3.1 AI models to complete the entire process. Like Google’s tool, NotebookLlama’s podcast also follows a back-and-forth conversation between two AI hosts in a free-flowing manner.
The Meta NotebookLlama AI tool uses three major language models to generate audio podcasts from blocks of text. Currently, the tool only accepts PDF files as input, so users will have to convert whatever text format they have to PDF.
NotebookLlama first uses the Llama 3.2 1B instruction model to pre-process the PDF file and save it to a ‘.txt’ file. Then the Llama 3.1 70B instruction model is used to write a podcast transcript using the source dataset. The transcription is then dramatized using a rewriter using the Llama 3.1 8B instructional model. Finally, a custom tool is used to add the transcription to a text-to-speech workflow. Meta uses the Parler TTS tool for this. Interested individuals can access all the models needed to generate podcasts from the GitHub listing here.
However, the AI models mentioned above are just recommendations from the developers. Users may prefer to use smaller models for each step, but results may vary. Meta emphasized that to run the AI system in the recommended configuration, users will need a GPU with a total memory of around 140 GB.
An X user (formerly known as Twitter). posted an example of the generated podcast. Based on this, it seems that the audio quality is not as good as Google NotebookLM, and it sounds shrill and robotic. Furthermore, there are instances where parts of the audio are skipped and the AI hosts end up talking over each other.
Meta acknowledges some of the issues and plans to improve them in the next version of the AI product. The company emphasized: “The TTS model is the limitation of how natural this will sound. This can probably be improved with a better pipeline and the help of someone more knowledgeable.”
The tech giant also plans to use two different LLMs to write the script, with each model debating each other to make the podcast sound more conversational. This is also part of the developers’ future pipeline. In addition, the company is also testing the Llama 405B AI model to write the transcriptions and increasing support for more input and output formats.
For the latest tech news and reviews, follow Gadgets 360 X, Facebook, WhatsApp, Wires And Google News. For the latest videos on gadgets and technology, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who is that360 on Instagram And YouTube.
iPhone SE 4 expected with 6.06-inch LTPS OLED screen, 3279 mAh battery and Apple’s internal modem
iPhone 16 series sales blocked in Indonesia due to unmet investment requirements