Parse Podcasts With Python: Understanding Lex Fridman’s Podcast With Deepgram ASR And Text Analysis
Yujian Tang
When it comes to podcasting, Lex Fridman is an expert in the craft: delivering extensive, in-depth interviews with fascinating people for a large and engaged audience.
A computer scientist and AI researcher at MIT by day, Fridman also maintains an active online media presence. This includes his YouTube channel, on which he’s racked up nearly 270 million views and over 2.1 million subscribers. And then there’s his podcast, episodes of which can run for over two hours. His interview-based show has featured guests from the computer science field like legendary TAOCP author Donald Knuth, technology entrepreneurs like Anaconda founder Peter Wang, and various media personalities ranging from Vsauce youtuber Michael Stevens to Joe Rogan.
It’s clear that he knows what he’s talking about, but using deep learning we can get a better understanding of what Fridman says, how he interacts with his interview subjects, and maybe get a better understanding of what it means to be a good podcaster.
In this article, we show how to use Deepgram's transcription and understanding API to extract and analyze what Fridman and friends say on the podcast, and perform further analysis with The Text API to pull out even more insights. We’ve structured the article to highlight our findings first, but don’t worry: there is plenty of technical explanation (and Python code snippets) in the second half. All the code for this project can be found on Github.
Let’s Talk about Lex (and What Makes a Podcast Good)
The Lex Fridman Podcast launched in 2018 with the tagline, “Conversations about the nature of intelligence, consciousness, love, and power.” As mentioned earlier, episodes follow a relatively consistent interview format, the majority of which feature Fridman and one other guest, though sometimes a third person joins in.
Using automatic speech recognition (ASR) and text analysis, we used a set of episodes between #300 (Joe Rogan) and #311 (Magatte Wade) to find out:
What Fridman talks about, and his most-used phrases
Number of words spoken on an episode-by-episode basis
Total talk time per episode and how it is split between Fridman and his guest
Remember, more technical explanations of how we arrived at these findings can be found in the second half of this article. But before getting to the “how” let’s start with what we found when we ran episodes from the Lex Fridman Podcast through Deepgram and The Text API.
People Versus Things
Let’s start with the basics. In the universe of possible things to talk about on a podcast, there are basically two categories: people, and things. Analyzing transcripts of the Lex Fridman Podcast shows that conversations about people slightly edge out over conversations about things. Here’s a chart generated with Matplotlib.