The Dataset
All information on this site is based on a dataset of crawled podcast feeds. Podcasts are added to the dataset by crawling the iTunes podcast directory and fyyd. Additionally feeds can be submitted manually. Feeds are crawled according to the update interval specified in the feed, or every 24 hours if no interval is specified.
The dataset contains 236143 podcasts with 28605640 episodes. 3325994 episodes have been analysed for their audio properties.
Due to the selection process, the dataset is biased towards high ranking podcasts. Using fyyd as a source for new podcasts, introduced a bias towards German podcasts. The podcast language distribution is shown below.
Language | Podcasts | Percentage |
---|---|---|
English | 117909 | 49.9% |
Spanish | 24306 | 10.3% |
French | 19374 | 8.2% |
German | 18777 | 8.0% |
Dutch | 10623 | 4.5% |
Portuguese | 9944 | 4.2% |
Swedish | 6836 | 2.9% |
Chinese | 3854 | 1.6% |
None | 3815 | 1.6% |
Russian | 3500 | 1.5% |
Italian | 2146 | 0.9% |
Arabic | 2068 | 0.9% |
Persian | 1239 | 0.5% |
Polish | 1069 | 0.5% |
Danish | 965 | 0.4% |
Japanese | 892 | 0.4% |
Turkish | 838 | 0.4% |
Catalan | 797 | 0.3% |
Norwegian | 639 | 0.3% |
Ukrainian | 562 | 0.2% |
Other | 5862 | 2.5% |
The following charts shows the number of analysed episodes in relation to the published episodes.