Thursday, September 27, 2012

Machine Learning With K-Pop

I've finally found some time to write a few blog posts about some areas of interest that I've wanted to write about that are too lengthy to cover in a G+ post. This post is about Korean pop music, machine learning and the Pandora Internet radio station that I recently created, Super K-Pop Radio, for my new favorite music genre.

If you follow my Google+ or Twitter accounts, you may have noticed all the links I've been posting to K-pop music videos on YouTube. From now on, I'm only going to do that when I find something particularly noteworthy, so as to avoid annoying my friends who have different tastes in music. I am sharing a big YouTube playlist of K, J, and C-Pop music videos, so you can check out that playlist if you're interested in all the new videos that I've discovered.

A few months ago I decided to make a serious effort to start learning Korean, primarily for business reasons (there are a few times when I think it would've been helpful for me to know a little Korean when working with one of our OEM partners on previous Nexus phones). It's something I've wanted to do for a while, and once I got started on it seriously (I'm a big fan of the Pimsleur audio CD lessons, so I ordered the Korean I 30-lesson course and am working through that right now), I discovered that I really enjoy the way the language sounds and the abundance of high quality pop music and dramas coming out of South Korea in the past few years.

About 5 years ago, I made a similar decision to learn Japanese, and I managed to learn a thing or two (also using the Pimsleur CDs plus a few books) before losing interest and moving on to other studies. I definitely want to continue studying and improving (or at least not losing) my limited Japanese, but for the moment, it's more exciting for me to work on Korean, and I have many more opportunities to practice hearing the language. There are lots of things I'd like to write about learning a new language, and about aspects of Korean that remind me of things I'd previously learned from other languages, like Japanese and also French (similarities in the vowel sounds and the way the words connect together when spoken), but this post is about music, so I'll save those thoughts for another post.

The event that really kicked off my current K-obsession started a few months ago when Google released a native YouTube player for the Playstation 3. When I dusted off my PS3 to try it out, one of the top-ranking music videos was, you guessed it, PSY's Gangnam Style (technically I believe it was the duet version of the song with Kim Hyuna). Needless to say, I was hooked, and discovered many other cool K-pop videos in short order through the updated PS3 YouTube interface. The original Gangnam Style has over 278 million views on YouTube as I write this, and the duet version has over 54 million. Clearly this is a world phenomenon. So I started to get really seriously into K-pop, as it encompasses genres of music that I already enjoy, namely catchy dance-pop with good melodies, interesting rhythms, and R&B and electronica influences.

I've been a fan of J-Pop for many years, but J-pop as a genre doesn't really encompass the entire spectrum of uptempo pop music styles that I like. For example there's not a lot of R&B or funk influence in Pizzicato Five or Puffy AmiYumi songs that I can discern. K-pop seems to draw from a broader base of influences that match my own musical tastes, and I have some interesting data to prove that point, and maybe a few others. In fact, I think I can explain the popularity of PSY's Gangnam Style and Carly Rae Jepsen's Call Me Maybe (nearly 270M views!) quite easily based on their shared musical attributes.

The subject of this post is "machine learning," and one of my favorite methods for discovering new music in my favorite genres is the Pandora Internet radio station. I must apologize here to my international readers, because Pandora is currently only available in the U.S. (with limited access in Australia and New Zealand, according to Wikipedia) due to the ridiculous complexity and expense of today's music licensing landscape.

The way that Pandora works that's different from other Internet radio stations is that every song in their archive has been listened to by a trained reviewer, who tags the song in Pandora's database with a set of attributes describing its various musical characteristics (genre, tempo, major/minor key, instrumentation, singing style, etc) according to well-defined criteria for each genre of music. For more detail, see this 2009 article or this 2011 interview.

This is different from other music matching systems such as iTunes' "Genius playlist" / "Genius Mix" feature, which uses patterns discovered from the user's own play history and ranking plus the selection of tracks that are listened to by the millions of other iTunes users using the service, to create playlists of similar songs. A "Genius mix" is just a larger playlist based around the genres that the service determined that the user listens to frequently, that is shuffled and looped like a radio station, but stored locally. The user can create any number of Genius playlists based around any single track in their library, but the list of Genius mixes and their contents is determined by the user's listening history and Apple's algorithm, and is not configurable.

Another common approach to playlist creation is taken by Google Play Music's "instant mix" feature, which uses automatic algorithmic analysis of the frequencies and rhythms of the songs in your library to find other songs with similar sonic features (according to the algorithm) to create a playlist around songs with similar themes. I would also recommend a free program called MusicIP Mixer, which is currently unsupported (the company was sold to Gracenote) but still freely available for Windows, Mac OS X, and Linux, that scans your local music collection and then generates playlists that seem to be about as good as Google's instant mixes.

Both types of approaches (user behavior analysis or algorithmic analysis) work very well for creating playlists from your own music collection (presumably composed mostly of songs that you already like), but not so much for discovering new songs. With Pandora's approach, you crate a new station starting from a specific genre, or using an individual song or artist as a seed (you can have more than one seed, but it's usually best to start with something specific). I tried to do this with a few K-pop artists but then discovered I got a much better result by starting with the entire K-pop genre as the seed. For each song that plays on a station, you can give Pandora a thumbs-up or thumbs-down, and it uses those scores to determine which songs to play on that station in the future.

So we have human reviewers who have cataloged each song in Pandora's library, and then human listeners scoring each song that plays on each of the millions of channels that people create on the service. Where does the machine learning come in? When people think of artificial intelligence, they often think of neural networks and other complex systems designed to simulate the behavior of human brains. That makes for great science fiction, and can actually be useful in certain areas such as speech and vision recognition, but for this application, we don't need anything nearly that complicated.

Instead, the simple approach (simpler and more predictable than a neural net, anyway) is to imagine each song occupying a position in a very high dimensional space (you can't actually visualize this, but try to imagine it conceptually). In this space, instead of the X, Y, and Z dimensions of the real world, you have a dimension to represent each type of musical attribute that is recorded in the system. Because there are so many attributes, there are very many dimensions, but for each dimension, there may be only a very limited number of possible values. In fact, for many dimensions the only values may be 0 or 1: a song either does or does not have an acoustic piano, or a laid-back female vocalist, or whatever, and every single attribute has its own unique dimension in the problem space.

Because there are so many dimensions, each song can be thought of as having its own unique (or nearly unique) position within this high-dimensional space, even if a particular dimension only has 2 or 3 possible values. Instead of an X,Y,Z position in space, each song's "position" is a long list of numbers representing its value for each of the many attributes along which it has scored by Pandora's reviewers. "Nearby" songs in this high-dimensional space will have close or identical values along many dimensions, while "distant" songs will have few. For each radio station, Pandora has to decide which song to play next based on the feedback given by the user (thumbs up or down) for previous songs that it has played on that channel. Songs with thumbs down are eliminated from the channel, but how can the thumbs-up scores be used to discover similar songs with similar attributes?

This post is long enough already and I'm not an expert in the field, but the Wikipedia entry for support vector machines goes into much greater detail on how the math works. Essentially what the Pandora service does (or is likely doing, since I have no inside knowledge of their algorithm) is a whole bunch of matrix arithmetic to construct a hyperplane (or more complex shape) to cut through this high-dimensional space of musical attributes, dividing the songs that are likely to be appropriate to play on that channel (based on the weighted thumbs-up scores of previously played nearby songs) from the ones that are likely to not be appropriate. Thumbs-down scores only eliminate a particular song from the channel; they don't signal a dislike for similar songs in the same way that a thumbs-up does, but I think they do signal a slight dislike for other songs from that artist. Similar models are used for tasks such as email spam classification or automated sorting of news stories into categories (e.g. Google News).

The primary weakness of this approach for music is that there isn't a dimension in the Pandora database to express whether or not a singer or a song is actually any good, by the individual listener's subjective standards of quality. The only way that Pandora could build a system that scales up to its current size is by limiting their list of attributes to reasonably objective attributes for which a relatively large number of human listeners can be trained to give consistent scores across songs and reviewers. Whether a song uses an acoustic piano or an electric guitar is something everyone can agree on, but whether a singer or a song is any good is not so clear-cut. So here's what happened when I created Super K-Pop Radio and started rating songs based on my personal standards of good pop music.

The main problem that people typically have with Pandora is that they always get the same 20 or 30 songs and have a lot of trouble adding more variety to a particular station that's of high quality. For those who aren't familiar with the service, the particular Devil's pact that Pandora had to make with the recording industry in order to be considered an Internet radio station, as opposed to some other kind of music service with an even more exorbitant royalty structure than what they currently have to pay, requires them to limit the number of songs you're allowed to skip: currently the limit is 12 skips across all channels every 24 hours. The limit used to be 6 skips per channel and 72 skips total in any 24 hour interval, but the new system actually works better for me when I'm focused on fine-tuning a single station like I am now.

So this limitation makes creating and tuning your stations a sort of game, because when a bad song comes on, you have to decide between listening to the whole thing and rating it thumbs down after the next song starts (which has no penalty) vs. hitting thumbs down immediately to skip the rest of the song and losing a chance to skip an even worse song that might come on in the future.

Because of the limitation on skips, sometimes it's difficult to build up a large thumbs-up list due to the combination of horrible songs in the exact genre that you want, horrible songs in a completely random genre, and songs you've already rated that it's already played for you 5 times in the past 5 hours. Fortunately for the last case, there's a separate "I'm tired of this song" button which will remove that song from all of your channels for some period of time, which works fairly well for me once I have a large enough collection of thumbs-up on a channel.

Pandora really needs to add an "achievements" system like Xbox to give you trophies for listening to a particular song 20 times or 50 times or 100 times on a channel. It does seem to have a habit of finding 5 or 6 songs that it thinks are particularly representative of that channel to play for you as often as possible until you are really and truly tired of it, but once you do mark it "tired", usually a few new songs pop up in the list that might have been hidden by the greater "star power" of the songs it wants to play all the time. But as a challenge I like to listen to those songs as many times as I can absolutely stand to before clicking "I'm tired". My playlist is long enough that by the time it comes around again, I'll probably really be wanting to hear it.

So here's the interesting thing that happened with the K-pop channel. Instead of getting the same 10 songs, I actually accumulated a collection of nearly 100 thumbs-up (out of about 200 songs total) before Pandora completely ran out of South Korean bands to play that it thought were closely related. Then it switched completely to songs from non-Korean bands. What I discovered is that the songs that I like in the K-pop genre blend seamlessly into songs from R&B, synth pop, J-Pop, and so the base of songs that match all the Pandora attributes that I could add to the station became huge.

But I didn't actually like more than a small number of songs from a small number of the non-Korean bands. The other thing I quickly discovered is how much bad R&B, bad synth pop, bad Latin pop, bad dance pop, etc. there is out there. By this I mean I have a very high standard in a few areas when it comes to music that I really like and want to listen to, in general:
  1. Very high quality singing. Auto-Tune can conceal a multitude of flaws, but somehow there are so many bad singers (to my ear), including some big names, like Kesha (horrible!), Lady Gaga (I like a few of her songs, when she's singing in tune), Katy Perry, Justin Bieber, etc. that if I didn't have plenty of good K-pop songs to listen to, I'd probably abandon the entire genre.
  2. Equally high quality musical performance and production values.
  3. Interesting and enjoyable composition, melody and rhythm.
It's the third item that is the most challenging to judge, and of course the most subjective, but usually I can come down on one side or the other after one listen. BTW, it amazes me how many Disney child singers and musical soundtracks come up (because they're in exactly the K-Pop style) and how, even when the singing is not bad, the composition of the songs is always just so soulless and boring.

If you're a parent of young children who has to listen to those Disney soundtracks all the time, and you think it's horrible soulless corporate garbage, it's not because you're old and out-of-touch, it's because it really is soulless corporate garbage that doesn't have any reason to justify its existence. Buy your kids some Girls' Generation or Super Junior or the K-Pop version of whatever style they like, and you still might not like that kind of music, but at least it'll have some soul and some actual musicality to it, and they'll be better off for having some quality music to listen to.

So after many hours of listening and 300 or so rankings of non-Korean bands, Pandora is now starting to mix in songs from the K-Pop bands again, and even branching out into J-Pop and some other areas that I also like. But as I write this, I only liked 124 out of 527 songs: 100 songs from S. Korean bands plus 24 songs from the rest of the world.

If you like stats, I actually made a spreadsheet to add up the numbers, which I know marks me as a hopeless geek. I'm continuing to update the spreadsheet whenever I listen to the station, so here's a link to view the current version and here are snapshots at 500 songs and 400 songs ranked (you can open them in different tabs and flip back and forth). BTW, if you use Gmail and are concerned about your email privacy, you may want to open those links in an incognito window or log out of Google before opening them because otherwise your username will probably show up in the list of viewers where any other viewer could see it. Edit: because I made the doc world readable, Google Docs will keep your name and email address private and you'll show up to other users as "Anonymous user #x" if you're signed in.

Column A is the artist, column B is the # of songs from that artist that I rated thumbs-up, column C is the # rated thumbs-down, column D is 1 for a South Korean band, or 0 otherwise. B2 is the total thumbs-up count, C2 is the total thumbs-down count, D2 is the # of Korean bands in the list, E2 and F2 are the # of Korean bands I gave thumbs-up and thumbs-down ratings to (columns E and F are formulas I copied down the page, multiplying the cells in that row in columns B and C by the value in D, so the values are set to 0 for the non-Korean bands). G2 and H2 are the non-Korean thumbs-up and thumbs-down values, and in column H and I, I've put some formulas to provide the interesting stats.

So if you want an explanation for the popularity of both "Gangnam Style" and "Call Me Maybe", I think the simplest explanation is that they're catchy pop songs with high musical production values and good composition in a genre where it's easy to stand out from the rest of the pack because the rest of the pack (in this particular genre) tends to be pretty mediocre. I think it's not a coincidence that PSY, aka Park Jae-Sang, is a graduate of attended Boston University and Berklee College of Music. He knows his music theory and he has 10 years of experience making records in South Korea before Gangnam Style rocketed to the top of the charts. Edit: the Pandora bio for PSY says he graduated from both schools, but other sources say that he attended but did not graduate.

If there's a distinguishing characteristic of K-Pop as a genre, I would say it's the quality and attention to detail as opposed to something that's specifically Korean (they're not wearing hanbok and playing the gayageum or anything like that). I think I'm definitely a bit unusual for being so particular and not being able to really enjoy music that isn't of super high quality, because a lot of people don't have a good ear for pitch so they don't even notice the little things that turn me off, and it definitely limits the number of songs that I enjoy listening to, but then again, I had private piano lessons as a child and played in the school band for many years, so I have a certain set of standards that people who didn't play musical instruments as a kid don't have. I wouldn't be surprised if the South Korean school system and society place a much higher value on musical education compared to the U.S.

I'll keep listening to the Pandora station and adding thumbs up and thumbs down, and feel free to listen along if you enjoy the genre as much as I do. I'll try to start on my next blog post in a day or two.