The Dataset

All information on this site is based on a dataset of crawled podcast feeds. Podcasts are added to the dataset by crawling the iTunes podcast directory and fyyd. Additionally feeds can be submitted manually. Feeds are crawled according to the update interval specified in the feed, or every 24 hours if no interval is specified.

The dataset contains 241023 podcasts with 31738254 episodes. 3363304 episodes have been analysed for their audio properties.

Due to the selection process, the dataset is biased towards high ranking podcasts. Using fyyd as a source for new podcasts, introduced a bias towards German podcasts. The podcast language distribution is shown below.

Language Podcasts Percentage
English 119183 49.4%
Spanish 24669 10.2%
French 19743 8.2%
German 19143 7.9%
Dutch 10887 4.5%
Portuguese 10161 4.2%
Swedish 7038 2.9%
None 5480 2.3%
Chinese 3887 1.6%
Russian 3534 1.5%
Italian 2166 0.9%
Arabic 2092 0.9%
Persian 1255 0.5%
Polish 1077 0.4%
Danish 973 0.4%
Japanese 907 0.4%
Turkish 842 0.3%
Catalan 810 0.3%
Norwegian 651 0.3%
Ukrainian 569 0.2%
Other 5956 2.5%

The following charts shows the number of analysed episodes in relation to the published episodes.
