The Information Arbitrage Nobody is Talking About

January 20th, 2020 / 4 minute(s) reading time

TLDR: Podcasts since they are downloadable, popular, and third-party. Automated transcript writing will make this possible.

Everyone is talking about the rise of audio. Airpods sold out everywhere this holiday season. Joe Rogan gets 200 million downloads a month and outreaches Jimmy Fallon’s audience by 5-10x. However, people are not talking about what this means.

Marc Andreesen recently spoke on this rise, 

“Everybody’s kind of wondering where people are finding the spare time to watch these YouTube videos and listen to these YouTube people in the tens and tens of millions. And the answer is: they’re at work. They have this Bluetooth thing in their ear, and they’ve got a hat, and that’s 10 hours on the forklift and that’s 10 hours of Joe Rogan. That’s a big deal… Of course, speech as a [user interface] is rapidly on the rise. So I think audio is going to be titanically important.”

This arbitrage opportunity does not rely solely on podcasts on YouTube. Many podcasts are on YouTube. But, most podcasts are actually published by third-party providers and republished to YouTube among several other platforms (Apple Podcasts, etc.). This is where the key distinction lies. These third-party providers allow things YouTube does not. Namely, downloading. (why Youtube does not allow downloading? Copyright law).

So, how can people benefit from this? Machine learning is actively making every podcast transcribable. Options like Google Cloud Speech-to-Text can reliably convert mp3 files into transcripts in 120+ languages. Now, instead of skimming back through an entire podcast just to search for a specific part, Ctrl+F is possible.

This means there will be an abundance of data. This information will be incredibly hard to organize. Traditional information outlets like Wikipedia that rely on manual editing will be overwhelmed. Other recent options such as Podcast Notes suffer from the same problem.

One option is to have ML assisted data organization. Most podcast transcripts will be littered with filler words and unimportant ramblings. Advanced filters will sort through the junk to make this legible. Many companies are positioning to win this opportunity. Companies like Golden and Agolo, who are doing data organization and summarization respectively, show promise in accomplishing this.  

The winner of podcast organization will analyze billions of minutes of audio. The hidden stories, opinions, and facts in these should be easily accessible and summarized. The podcast medium is overflowing, but who will win the data-war it is bound to create?


Tim Proctor

I write to understand the world better.