Meta has introduced NotebookLlama, an open-source Artificial Intelligence assistant aimed to transform a PDF document into an audio podcast. Like Google NotebookLM, NotebookLlama produces friendly semantic talk streams from text files that are uploaded into it. Built using Llama models specific to Meta, the tool employs a sequence of steps through which a PDF is translated into an engaging podcast format.
How Does It Work?
The process starts with Llama 3.2 1B model that is used to convert the PDF format to text form. Next, the Llama 3;1 70B model creates a podcast like script and the Llama 3.1 8B model adds conversational context to the generated output. Augmenting provokes “more dramatization” and interruptions before passing feed to the open text-to-speech models.
Last of all, Meta’s Parler TTS (Text-to-Speech) service translate the script into sound and generates an AI conversation between synthesized characters. NotebookLlama also includes elements like dramatization and interruptions, making the audio sound more like a real conversation.
Challenges
Despite that, some people have worried about NotebookLlama’s output quality despite how appealing the idea is. Users report a “robotic” tone and occasional voice overlap, falling short of the natural flow in the NotebookLM’s output.Meta researchers have said that while they have acknowledged that the text-to-speech component currently limits how natural the final audio sounds.They have introduced potential refinements, such as using two AI agents to debate and collaboratively draft the podcast outline.
“The [text-to-speech] model is the limitation of how natural this will sound,” they wrote on NotebookLlama’s GitHub page. “Also,another approach of writing the podcast would be having two agents debate the topic of interest and write the podcast outline. Right now we use a single model to write the podcast outline.”
However, like all AI podcast generators, NotebookLlama faces the inherent problem of ‘hallucination’ in which the AI might add fabricated details in the conversation. This is an area that future updates of NotebookLlama could perhaps address: the corpus needs to be curated to achieve both higher conversational realism and higher accuracy.