The Stars are a Lie

Machine Learning (ML) certainly seems to be a hot topic this year, with no signs of slowing down. Many frameworks for manipulating data on large scales are coming to the masses, and even though I don't work with data sets that could legitimately called "Big Data", I have found some very neat applications put to work with Apache Spark.

Here is a Google Trends for people search for machine learning for the past 5 years. December 4-10 2016 is literally the high-point of the graph. I have a feeling that sharp lull immediately after has more to do with that data spanning the Christmas and New Years holidays.

So what are some things ML is good for? One is for predicting if you will like something (i.e. a recommendation engine). Google has been doing this for a long time in Gmail, where their industrious robots toil away reading your emails in a non-human manner in order to recommend products to you (advertisements). Music and Movie sites do this as well. Netflix is actually the reason I wrote this post.

I was at my fathers house on New Years Eve, and we were going through Netflix. He quickly spun across the movie Wyrmwood: Road of the Dead (that type of movie isn't his thing, so I do mean quickly), and I noticed something. It had 2 stars. Peculiar. I had recently added that movie to my list, as it had nearly 5 stars! Maybe it's just because I've had Netflix for well over a decade, but 5 star movies always seemed fairly rare. It takes me longer to find something to watch than actually watch it, so if I see a highly rated movie in a genre that I'm not currently in he mood for I'll add it to my list so I can check it out later. So why was a near 5 star movie on my account a 2 star movie on his? I headed over to Netflix's Help Site to figure it out.

How does it all work? We use a recommendation algorithm that takes certain factors into consideration, such as:

  • The genres of movies and TV shows available.
  • Your streaming history, and previous ratings you’ve made.
  • The combined ratings of all Netflix members who have similar tastes in titles to you.

That last part is the important one. The combined ratings of all Netflix members who have similar tastes in titles to you.

What this means, for example, is that if you have rated 10 movies, and it knows of some accounts that also like those 10 movies as well as another 5, then it will suggest those other 5 to you. Unfortunately, that is also pretty much the definition of being "in a bubble". Congratulations, snowflake: you will be told that the choices of like minded people, just like you, are highly rated films worth your time.

In fact, here is the that movie on three different profiles on my one account:

Netflix Profile 1 4.75 Stars on my main profile


Netflix Profile 2 4 Stars on my old profile


Netflix Profile 3 3 Stars on my guest profile

What this also means, is that when my wife starts using my profile I suddenly get recommended romantic comedies instead of horror movies about demonic position [read as: Why I already have multiple Netflix profiles, and why Mark2 is considered my main one].

Don't get me wrong, it's a big problem to figure out. It's one that constantly evolving. It's also one that Netflix is pretty open about, even putting up a contest a few years back: The Netflix Prize. As stated on that same Netflix page:

We offer thousands of titles to stream -- that’s a lot! When you rate movies and TV shows, you're helping us filter through the thousands of selections to get a better idea of what you'd like to watch.

I just feel a little lied to. I wasn't told that is was highly recommended. I was recommended it, and told that it was highly rated. There's a difference.

Further Reading: I found this article to be a very well thought out walk-through of some of the core concepts of Machine Learning, along with some good practical examples. There is also a site called kaggle that hosts data sets and competitions if you are looking to try and apply some skill along the way.