Reinventing PocketPod

PocketPod was a consumer app that generated fully bespoke podcasts using AI. This idea got us into YC W24, a little legal trouble with NPR, and ultimately was pivoted as we moved into the B2B world.

The Original PocketPod

In the nascent days of text-to-speech (TTS) models and GPT 4, it took us months to build the research pipelines, recommender system, tune the scriptwriting process, and then convert to a coherent conversational audio episode that could be distributed via RSS feed to Spotify. Nowadays, this seems much more tractable. LLMs have web search built in. Instead of complex two-tower recommender systems and thinking about the cold-start problem, try an LLM-as-recommender. Script generation should be easily optimized with a handful of golden examples. Audio models are eerily good, with no need for the complex QA systems we built.

A significant challenge from PocketPod was its unit economics: a user on a sub-$10 monthly subscription could rack up hundreds of dollars in token usage through their daily news podcast and one-off podcast generation. Seeing the new Gemma 4 open source models from Google gave me the thought - could I rebuilt PocketPod so that it runs entirely on my MacBook, with the only cost being electricity? We’ll find out.

First, we can break this system into its principal components. PiratePod must be able to:

Research → explore a story or stories
Recommend → assign stories to a user based on their stated preferences
Generate Scripts → Convert those research reports into conversational podcast scripts
Generate Audio → Turn scripts into a single, coherent audio episode
Publish and Host → save the user’s episodes to a RSS feed

The simplest starting pipeline is for one-off story generation with a known url. This was common for research papers or long articles that users wanted restyled and converted to audio.

This basically looks like:

Input story URL
Scrape contents
Convert scraped content to script
Convert script to audio
Publish to RSS feed

From here, there are many tech trees to be pursued. We can move from user-provided inputs to an intelligent recommendation system. We can allow control of the number of speakers or style. We can add the ability for users to create their own hosts with voice cloning and personality prompts. Length can become dynamic - a popular feature of PocketPod was the ability to add or remove stories to match up with commute duration. The podcast can become interactive. Different languages can be supported. We can clip in real audio snippets to enhance the feel of an otherwise completely AI podcast.

Further, more methods of creation and interaction can be supported. PocketPod was an iOS app. PiratePod could support a web or mobile app, terminal interface, or most interestingly an MCP server for personal agents to add podcast generation to their repertoire.