The Dataset

All information on this site is based on a dataset of crawled podcast feeds. Podcasts are added to the dataset by crawling the iTunes podcast directory and fyyd. Additionally feeds can be submitted manually. Feeds are crawled according to the update interval specified in the feed, or every 24 hours if no interval is specified.

The dataset contains 236143 podcasts with 28605640 episodes. 3325994 episodes have been analysed for their audio properties.

Due to the selection process, the dataset is biased towards high ranking podcasts. Using fyyd as a source for new podcasts, introduced a bias towards German podcasts. The podcast language distribution is shown below.

Language Podcasts Percentage
English 117909 49.9%
Spanish 24306 10.3%
French 19374 8.2%
German 18777 8.0%
Dutch 10623 4.5%
Portuguese 9944 4.2%
Swedish 6836 2.9%
Chinese 3854 1.6%
None 3815 1.6%
Russian 3500 1.5%
Italian 2146 0.9%
Arabic 2068 0.9%
Persian 1239 0.5%
Polish 1069 0.5%
Danish 965 0.4%
Japanese 892 0.4%
Turkish 838 0.4%
Catalan 797 0.3%
Norwegian 639 0.3%
Ukrainian 562 0.2%
Other 5862 2.5%

The following charts shows the number of analysed episodes in relation to the published episodes.

Pygal0044881212161620202424282832322024-10-012024-10-022024-11-162024-12-012024-12-052024-12-16DateEpisodes3311.8153846153846166.038461538461492024-10-011129.96923076923076298.812354312354332024-10-021248.12307692307692298.812354312354332024-11-161366.2769230769231298.812354312354332024-12-011484.4307692307692298.812354312354332024-12-051602.5846153846154298.812354312354332024-12-163211.81538461538461615.1876456876456132024-10-011129.96923076923076298.812354312354332024-10-020248.12307692307692307.961538461538452024-11-160366.2769230769231307.961538461538452024-12-010484.4307692307692307.961538461538452024-12-051602.5846153846154298.812354312354332024-12-16PublishedAnalysed