

In this overview, I think there are two interesting insights still: Pop seems to be more strongly related with edm and some other electronic genres, and have a negative tau with hip hop related genres, like hip hop (-0.29), pop rap (-0.28) and rap (-0.32). My initial hypothesis that pop would be correlated with hip hop has been debunked, though. Then, looking at hip hop, we can see very strong coefficients with rap and pop rap, neither of which are big suprises. We can see a strong tau between most of the electronic music genres, like edm, electro house, bass trap, big room, brostep and electronic trap. We immediately can see some interesting clusters. If we plot this nicely, we get the following overview of "correlations". Lets loop over all the combinations of the top 20 genres and compute their tau coefficient. One other thing to note is that Kendall's tau is symmetric, and this means tau(a, b) is the same as tau(b, a). However, because we are working with a binary situation (genre is either present or not) represented by 0 and 1, I think this should still work. Normally, Kendall's tau is meant for ordinal values (variables that have an ordering). I choose Kendall's tau for this, due to its simplicity. Instead, we should use a metric that works with nominal values. Now, because genre is a nominal data type, we cannot use the standard correlation, which is the Pearson correlation coefficient. Now that we have this data, we can do a correlation analysis of when each genre coincides with what other genre. This results in a dataframe with a column for each of the top 20 genres. However, since we're doing arithmetic with it later, int8 will do. Since we only wanna represent a binary state (present or not present), we could also use boolean. This means that, instead of the normally 32 bits, we use 8 bits and thus safe some memory. This will test our hypothesis that pop is used as a tag for hip hop, but will also in general provide us with a better feeling of what genres are related to which other genres.įor this,we loop over the rows and for each present genre, we put a 1 in that column, while also casting to np.int8. Lets dive a bit deeper into this!Īs a next step, let's verify which genres coincide with which other genres. I expect many hip hop songs are also tagged as pop, which would explain the high pop presence, while I normally am not such a pop fan. So my Spotify is mainly dominated by hip hop and its related genres, like rap, hip hop and pop rap (whatever that is? Drake maybe?). However, for the latter two I mainly use Youtube, which hosts sets that Spotify does not have.

My main music tastes are hip hop and electronic music, with main genres techno and drum and bass. Lets start with the total listens per genre. This will be handy in the near future.Įnter fullscreen mode Exit fullscreen mode To analyze the genres, I first create a dataframe that contains all of the genres and their counts. The next question then, naturally, is: What genres are they? So let's see!įor the following analyses, remember that if I play 10 songs by Kanye, Kanye's genres will be present 10 times. In part 1, we have seen how many genres each song has and how their numbers are distributed.
