At some level, we’re all familiar with recommender systems even if the term doesn’t ring a bell. They are what drive Amazon’s ability to show you things that you didn’t realize you wanted to get (and subsequently do); they’re the secret sauce behind Netflix, YouTube and so many other media giants. When it comes to recommender systems, there are two main approaches: recommendations based on content (content-based) and recommendations based on users historical preferences (collaborative filtering). As things stand, collaborative filtering has become the more popular approach. As it turns out, if you’re armed with everyone’s preferences, you can make some pretty on the mark recommendations.
Sadly, you’re only armed with everyone’s preferences if you work for one of these behemoth tech companies. But hope is not lost for us normal folk because there are instances in which content-based methods (i.e. what normal folks can use) prove just as if not more useful than collaborative methods.
The other night I was finishing up Once Upon a Time in America (1984) and I had an urge to watch another 80s mob/crime flick. While I could have ventured to Netflix or IMDB to help me out in my quest for a movie in this niche, the former may not have had the selection I was looking for and the latter might have produced a set of results that would take a decent amount of time to vet. A one-off vetting is no big deal but if we can imagine this happened somewhat frequently, it might be fruitful to explore a different avenue to procuring recommendations.
So I set off to build something capable of handling this. In a nutshell, I gave my computer a large amount of movie plots and trained it to look for good movies with a similar plot to the movie of my choice. Additionally, I instructed it to only choose films that were made around the same time.
(It gets slightly technical here. Those interested in the nitty gritty can see the underlying code here)
The approach used here is a content-based recommender system. It relies on over 60,000 scrapped movie plots and accompanying meta data (e.g. release year, aggregate IMDB score, etc.) for data. The plots of these movies are then processed using a mix of standard text cleaning processes and word embeddings (BERT). Movies are then ranked on similarity by their cosine similarity. If we were to stop here, we’d have movies that are close in content but not necessarily in quality or release year. To handle this, we’ll use some simple filtering though we could have certainly increased the complexity of the model and handled this natively as well. While the model could produce an arbitrarily long list of recommendations, I’ve limited its output to the top 5.
As a bit of primer for those who haven’t seen it, here’s the plot* of the 1984 Sergio Leone film (via Google):
In 1968, the elderly David “Noodles” Aaronson (Robert De Niro) returns to New York, where he had a career in the criminal underground in the ’20s and ’30s. Most of his old friends, like longtime partner Max (James Woods), are long gone, yet he feels his past is unresolved. Told in flashbacks, the film follows Noodles from a tough kid in a Jewish slum in New York’s Lower East Side, through his rise to bootlegger and then Mafia boss — a journey marked by violence, betrayal and remorse.
*The model uses plots from IMDB which were longer and more detailed
Our recommender system took the plot of Once Upon a Time in America and shot back the following films (I’ve included a brief synopsis for each as well):
The Untouchables (1987): “During the era of Prohibition in the United States, Federal Agent Eliot Ness sets out to stop ruthless Chicago gangster Al Capone and, because of rampant corruption, assembles a small, hand-picked team to help him.” (Source: IMDB)
Mississippi Burning (1988): “Two FBI agents investigating the murder of civil rights workers during the 60s seek to breach the conspiracy of silence in a small Southern town where segregation divides black and white. The younger agent trained in FBI school runs up against the small town ways of his former Sheriff partner” (Source: IMDB)
Scum (1979): “Powerful, uncompromising drama about the struggle for survival in the nightmare world of a brutal borstal. This is the feature film version of the original 1977 `Play for Today’ drama which was banned by the BBC for 14 years. Two boys struggle to survive Britain’s notorious Borstal Reformatory.” (Source: Google)
Thief (1981): “Frank is an expert professional safecracker, specializing in high-profile diamond jobs. After having spent many years in prison, he has a very concrete picture of what he wants out of life–including a nice home, a wife, and kids. As soon as he is able to assemble the pieces of this collage, by means of his chosen profession, he intends to retire and become a model citizen. In an effort to accelerate this process, he signs on to take down a huge score for a big-time gangster. Unfortunately, Frank’s obsession for his version of the American Dream allows him to overlook his natural wariness and mistrust, when making the deal for his final job.” (Source: IMDB)
Over the Edge (1979): “A group of bored teenagers rebel against authority in the community of New Granada after the death of one of their own.” (Source: IMDB)
What can we say about these recommendations?
For starters, maybe I’m not the film buff I thought I was since a couple of these names were new to me!
More importantly however, the recommendations are all crime films. Beyond that, each film has its own common ground with Once Upon a Time in America. The Untouchables recounts a mob-related story set during the same period as the Leone flick. Mississippi Burning explores the conflict-riddled relationship between its protagonists similar to that of De Niro and Woods. Scum is set against an equally violent backdrop and tells the story of two adolescents. Thief tells the story of a man who’s trying to leave his crime ridden past behind him like De Niro’s crew manages to do in Once Upon a Time in America. The model drew similarities to the youthful rebellious aspects of Over the Edge.
While it’s not perfect, it seems like our model is able to recognize similarities and recommend titles that make sense.
The potential draw of using something like this is that any one of these online entertainment providers is limited in its universe of movies. That is, while Netflix may cover Once Upon a Time in America, it may not have any of the others shown above and the ones it does have may not be worth watching or you may have already viewed them. This approach provides a solution to that problem. The model favors highly rated movies so there’s a lower likelihood that you’ll end up watching a bad film.
As cool as this might seem, there is still room for improvement. For starters, the recommendation system doesn’t filter on language and therefore sometimes recommendations are for foreign films which the viewer may not be interested in. Additionally, while I was able to retrieve a list of some pretty sweet forgotten classics, that wasn’t my goal. If I were to instead use the model to look for recommendations for “Saving Private Ryan” (1998), the model returns a number of well known flicks:
- Schindler’s List (1993)
- Braveheart (1995)
- Hero (2002)
- Enemy at the Gates (2001)
- Tombstone (1993)
Being able to filter on well-known and less well-known films could be an important add and maybe that’s something that will be made explicit in a future iteration. After all, during these quarantined times, many of us have likely exhausted all of the obvious sources of entertainment.
Lastly, a further refinement would be to get a hold of millions of user reviews and create a hybrid approach that uses both plot similarity and user preferences. (Who knows, maybe that’ll appear here soon.)
So tl;dr? If you liked Once Upon a Time in America and want to watch another good crime film made around the same time, here’s a list of flicks to start with:
- The Untouchables (1987)
- Mississippi Burning (1988)
- Scum (1979)
- Thief (1981)
- Over the Edge (1979)