Will it be Delicious?

I’m dreaming of a future when we can go back to unencumbered dining and return to absolute freedom in restaurant choices. My dreams may not be grand but I’m taking steps to ensure that when fate wills them into existence, I’ll be able to capitalize on them.

You can make a reservation for most restaurants in NYC online via OpenTable, Resy or Yelp. Each has a large selection of the best of the best as well as some not-quite-as-good spots. In most cases, each lets the interested party view menus and pictures of restaurants to get an idea of what to expect. My thinking is: what if I could avoid the time-consuming task of filtering through a whole bunch of spots and let machine learning discern whether a restaurant was worth going to or not?

“How?” you might ask. We’ll gather as many menus and images from NYC restaurants on OpenTable and train a computer to identify whether a restaurant will be “superb” (rated 4.5 and higher on a 5-point scale) or “good/meh” (rated below 4.5). Note: this is a manual demarcation and could be adjusted if you’re more or less snooty than I am. With this data, we’ll train a computer vision model on the images to recognize the two types of restaurants and a NLP model on the menus for the same task.

If all goes well, we’ll be able to feed the computer menus and pictures of restaurants like Baker & co (left), Ilili (right), or an absolute must Lilia (cover) and it will tell us, “that’s quality dining right there”.

Model 1: Restaurant Images (Computer Vision)

We would hope that most humans could visually distinguish the appetizing dishes of a 5/5 restaurant from those of a 3/5 restaurant. The task gets more difficult when one restaurant is rated 4.3 out of 5 and the other 4.6 out of 5. Nevertheless, we’ll see what computer vision can do.

To keep things simple to start with, we’ll limit our image types to NYC-restaurants who serve the most common types of cuisine on OpenTable (i.e. American and Italian). Within this universe, there are ~26,000 images to play with. We’ll train our model with the ResNet-152 architecture. Rather than looking at a composite set of images for each restaurant (i.e. a panel of n-images), we’ll label each image individually.

Unfortunately, the results aren’t super impressive. We only mange to accurately label restaurants 65% of the time. Maybe it’s because the link between taste and look is not as close as assumed. Maybe it’s because we need more data. Rather than linger and try to suss out reasons the model isn’t producing as accurate of results as we’d hope, let’s pivot and try a different type of data altogether.

Model 2: Restaurant Menus (NLP)

While a picture is worth a thousand words, we shouldn’t feel particularly confident with our 65% computer vision model so let’s hope that the NLP-version will come to our rescue.

For our data, we’ll only use the menu items and any descriptions that accompany them (we’re not going include prices). Rather than limiting ourselves to Italian and American cuisine, we’ll include all cuisines available. To go from the text domain to one that is machine interpretable, we’ll use word embeddings (BERT) to encode our data and use the same rating classification schema as above. In total, there are 2,640 restaurants with both ratings data and an included menu. Because we’re not swimming with data for millions of restaurants, we’re somewhat limited in the model that can be employed. So instead of using a neural network architecture, we’ll stick to an ensemble of a tuned support vector machine and tuned gradient boosted tree models.

Using this approach, we’re able to get to 76% overall accuracy. If we look a little closer though, we can see that our model is very good (98% precision) at predicting superb restaurants but just doesn’t predict them that often. If we blindly allow our model to tell us where to eat and are satisfied with not dining at the best of the best every night, we should be very confident that when it tells us that a restaurant is going to be absolute flames, it most certainly will be.

Further Improvements

Pictures of restaurants on OpenTable, Yelp, Resy, or even Instagram will invariably have both food and scenery/décor. While the latter definitely contributes to one’s overall dining experience, we don’t want to confuse our machine with images that vary too much in content.

Putting an image to it, the problem can be visualized below. What you’re looking at is a clustering of the image types in our data set. Essentially, the computer is able to recognize on its own that the image types are not all the same. The purple points (the larger cluster) correspond to images of food; the yellow ones are scenery, décor and whatever else people take pictures of at restaurants. I’ve added an example image from each cluster (the image on the left shows food from Altesi Ristorante; the one on the right shows the dining tables at 107 West). [To produce this visualization, t-SNE/k-means cluster analysis is performed on extracted image features using the ResNet-50 model.]

It would likely improve the model’s accuracy if we first trained the machine to recognize whether it was looking at a picture of food or a picture of something else. From there, we could either remove the images of restaurant décor or panel both types of images together to get a composite image of a restaurant that included both food and scenery.

Additionally, good food is not exclusive to New York. Expanding this study to include pictures and menus from restaurants around the world would likely also lead to more robust results.

Lastly, if somehow these two inputs (pictures and text) could be combined to produce something even more powerful, then we’d really be…cooking.

Use Cases

Beyond being a temporary novelty, there’s power in being able to look at pictures of food or the items on a menu and infer whether the food will be good or not.

For instance, imagine seeing a few pictures on Instagram or Resy of some restaurant that just opened shop and thinking, “dang, that looks like the bees knees”. With limited reviews, you’d be skeptical. With the aid of this model, you can have more confidence that the food will taste just as good as the menu makes it seem.

So, to the foodie in you or those around you that is looking for a new hidden gem, this may be your play when things return to normal. If you’re less keen on keeping it to yourself and want to be able to say you dined at “XYZ” before you needed to wait a month to get a table, this has you covered too.

1 comment

Leave a comment