Many chefs, gourmets, and food-related researchers have focused on studying food pairing for decades. There are books [Page and Dornenburg, 2008; Dornenburg and Page, 2009] featuring a number of food pairings recommended based on accumulated experiences of professional chefs and food gourmets in the culinary world. Since food pairings are made based on the experiences of experts, food pairing itself is subjective and difficult to quantify. In the academic field, some food-related researchers [Ahn et al., 2011; Ahn and Ahnert, 2013; Garg et al., 2017; Simas et al., 2017] focused on determining the qualities of complementary food pairings based on analysis of sharing flavor compounds. However, FlavorDB built by [Garg et al., 2017] contains only a limited number of flavor compounds and natural ingredients and a considerable amount of time and effort is required to analyze the flavor compounds of food ingredient.
In this work, we introduce KitchenNette which is a model based on Siamese neural networks [Koch et al., 2015]. As shown in Figure 1, KitchenNette first trains on our annotated dataset containing more than 300k scores of known pairings, which constitute only 5% of the total possible number of pairings in our dataset. These quantified scores indicate whether each food pair is complementary or not. Then our trained model predicts the scores of unknown pairings consisting of food ingredients that have infrequently or never been used in recipes, which constitute the remaining
Figure 1: By training on our annotated dataset containing scores of well-known food pairings (e.g., gin&tonic water, salt&pepper, vanilla&onion), our model predicts the scores of unknown food pairings (e.g., gin&aquavit, wasabi&nori, lime&nopales) that are not annotated because they are less popular or infrequently used.
95% of the total number of pairings. The three unknown pairings in Figure 1 that our model found are known to be culture specific pairings in Nordic, Japanese, and Mexican cuisine, respectively. To train our model, we constructed our own dataset which contains the golden standard scores of 300k food ingredient pairings obtained from 1M human-generated cooking recipes [Salvador et al., 2017; Marin et al., 2018]. Here, the amounts and personal preferences of ingredients and the preparation process in cooking were not considered. Our model employs Siamese neural networks with wide&deep architecture designed to learn the relationship of a food pairing. We then conducted experiments to compare it with several baseline models and confirmed that our model KitcheNette outperformed all the other models.
To further evaluate our model’s prediction performance, three qualitative analyses were conducted. First, we analyzed some example cases of food pairings to test whether our model successfully predicts the scores of unknown pairings. Second, we compared the ranking results of commonly used food ingredient pairings recommended by KitchenNette with those in FlavorDB [Garg et al., 2017]. Our ranking results are more reliable and consistent with human food-pairing knowledge, compared with the results of FlavorDB. Third, we compared the food pairing recommendations of our model, which were based on predicted scores, with those of cooking experts [Page and Dornenburg, 2008; Dornenburg and Page, 2009]. We found that most of the recommendations of our model were the same as those of the cooking experts, which demonstrates the accuracy of our model. In addition, our model recommended food pairings with ingredients not commonly featured in recipes. To this extent, our work attempts to broaden the underlying concept of food pairing and introduce a data-driven, deep learning based approach for discovering novel ingredient pairs.
The major contributions of our work can be summarized as follows.
• We propose a data-driven task that discovers new complementary food pairings that have the potential of being used for future recipes.
• We create a large scale dataset that contains the golden standard scores of food ingredient pairings.
• We show that KitcheNette1 which uses Siamese neural networks and wide&deep learning achieves high performance in predicting food ingredient pairing scores.
• We verify KitchenNette’s effectiveness in recommending complementary food pairings2 and discovering new food pairings through qualitative analyses.
2.1 Food Related Research
Researches on Discovering Food Pairings [Ahn et al., 2011] and [Ahn and Ahnert, 2013] introduce a flavor network where the network’s edge is built based on the number of flavor compounds shared by culinary ingredients. The flavor network is comprised of 381 ingredients and 1,021 flavor compounds. FlavorDB [Garg et al., 2017] combines exisiting food repositories to provide a larger database with the user-interactive page. Food-bridging [Simas et al., 2017] improves the flavor network [Ahn et al., 2011] by adding additional bridges between two ingredients through a chain of pairwise affinities despite the chemical compound similarity of the two ingredients being low. However, they cover only a limited number of flavor compounds and natural ingredients and some well-known food pairings (e.g., red wine and beef) have very few flavor compounds in common. Our work employs a data-driven method to define the scores of food pairings from the human experiences and find the new food pairings in large-scale.
Researches on Recommending Recipes The Recipe Recommendation [Teng et al., 2012] has been proposed to determine whether a food ingredient is essential in a recipe. Recipe Recommendation uses two different recipe networks that can accurately predict recipe ratings. Also, finding a surprising and plausible ingredient combination is not new. [Grace et al., 2016; Grace and Maher, 2016] combines cased-based reasoning and deep learning to generate new recipe designs. Our work is similar in that it recommends a new combination of food ingredients, but KitcheNette proposes and trains on a silver standard pairing scores on a larger-scale and is able to suggest novel pairings that never have been tried before.
Table 1: Statistics of Ingredients and Pairings. Known PairsI consist of ingredients whose occurrence counts are greater than 20. In addition, each known pair has a co-occurrence count of at least 5.
2.2 Siamese Neural Networks
Siamese neural networks [Koch et al., 2015] have been employed in various tasks to learn similarities between two different inputs. Also, some variations of Siamese neural networks have been introduced.
[Mueller and Thyagarajan, 2016] proposed the Manhattan LSTM model which takes two sentences as input. The Manhattan LSTM model generates vector representations for each sentence and calculates the similarity between the vector representations using the simple function where h1 and h2 are the embedding vectors. [Yuan et al., 2018] proposed a customized contrastive loss function that can be divided into a partial loss function for positive pairs and a partial loss function for negative pairs. The loss function widens the distance between two vector representations of negative pairs while narrowing the distance between two vector representations of positive pairs. While these two works ([Mueller and Thyagarajan, 2016; Yuan et al., 2018]) use Siamese neural networks and take two different inputs of the same type, [Liang et al., 2018]’s proposed Siamese-based model takes one input as the standard by which the other input is evaluated. The input is evaluated based on its similarity to the input used as the standard. Our model is designed to train semantic relationships of a food pair beyond simple distance-based similarity functions.
3.1 Dataset Description and Preprocessing
In this work, we utilized Recipe1M [Marin et al., 2018], a dataset containing approximately one million recipes and their corresponding images which were collected from multiple popular websites related to cooking. All content in Recipe1M can be divided into two categories: texts and images. The recipe texts of Recipe1M [Marin et al., 2018] consist of the following two parts: the list of ingredients and the instructions of a recipe. The Im2Recipe [Salvador et al., 2017] used a bi-directional LSTM based ingredient name extraction module that performs logistic regression on each word in all the lists of ingredients in Recipe1M to extract the ingredient names only apart. For instance, “2 tbsp of olive oil” is extracted as olive oil. From the instructions for a recipe, [Marin et al., 2018] trains all the vocabulary including ingredient names with the word2vec [Mikolov et al., 2013] fashion. Among a total of 30,167 learned vocabulary, we obtained 3,567 unique ingredient names as shown in Table 1.
3.2 Food Ingredient Pairing Score Generation
Food Ingredient Pairing Score Dataset
As mentioned in the earlier sections, traditional ingredient pairing methods have relied on human expertise, such as a long experience in the culinary industry or chemical details of food. As a result, the amount of data to define food pairing and perform deep-learning is absolutely small. To deal with this problem, we propose a new silver standard dataset of food pairing scores that 1) enables training deep-learning models on a large-scale and 2) defines if a pair of food is complementary or not on a scale between -1 and 1. We assumed that the co-occurrence information of ingredients from a large recipe corpus may provide insight into how ingredients are combined in each recipe. Within the scope of this study, we did not consider amounts of ingredients nor their cooking procedures since our dataset is based on a statistical co-occurrence information.
Normalized Point-wise Mutual Information
We calculated our golden standard food pairing scores based on point-wise mutual information (PMI) as introduced in [Teng et al., 2012]. The PMI score (1) is the probability of two elements co-occurring p(x, y), which is compared to the probability of each element occurring separately p(x), p(y). The custom score is designed to accurately represent good/bad pairs by penalizing highly popular ingredients such as salt or butter with low co-occurrence pairs. On the other hand, a pair that shows high co-occurrence with less popular ingredients will represent a good, meaningful pair (e.g., wasabi&nori).
# of recipes where y occurs # of recipes In our work, we used the normalized version of PMI (NPMI [Bouma, 2009]) to better train and fit our regression model. Point-wise mutual information can be normalized between -1 and +1 where -1 (in the limit) is given for never occurring together, 0 for independence, and +1 for complete co-occurrence. Thus, the scores between -1 and 1 intuitively determine if the pair is well suited or not.
Generating Ingredient Pairing Dataset based on NPMI
Scores
Ideally, we would calculate all the food ingredient pairing scores of 6,359,961 possible pairinggenerated from 3,567 unique ingredients. However, we found that ingredients that rarely occurred in 1 million recipe texts may act as noisy samples. Also, ingredients that rarely co-occur may lower the performance of the model. Therefore, we removed ingredients whose occurrence count does not exceed 20 and ingredient
Table 2: Ingredient Pairing Dataset. The five best&worst ingredients in our dataset for pairing with vanilla. The The Flavor Bible [Page and Dornenburg, 2008] recommended pairing chocolate, coffee, cream, ice cream, or sugar with vanilla.
pairings whose co-occurrence count dose not exceed 5 to build our golden standard known pairs in Table 1.
As a result, we obtained a total of 356,451 valid known pairings. All the other pairings were considered as unknown pairings. The final distribution of food ingredient pairing scores follows the approximated normal distribution. We assume that the ingredient pairing scores in the upper 5% (distribution are the top scores and scores lower than 0.274 are in the lower distribution. The scores of the five best and worst ingredients to pair with vanilla are contained in our dataset, as shown in Table 2.
4.1 Learning Ingredient Representations
We propose a model that predicts the scores of ingredient pairings. Our model architecture consists of two major components, as shown in Figure 2. The first is the ‘Ingredient Representation Component’ which uses Siamese neural networks [Koch et al., 2015] where two identical multi-layer perceptrons (MLPs) with the same weights each receive a different 300-dimensional word vector representation. Each MLP has two fully connected layers which process the input ingredient vector. Let be a pair of ingredients represented as 300-dimensional word vectors.
and
are the shared weights and bias of an MLP, respectively, and
denotes the activation function for non-linearity. We use
as rectified linear units (ReLUs). The learned representations
of this pair are mathematically expressed as follows:
where , and
,and i and j are the number of hidden units.
4.2 Predicting Food Ingredient Pairing Scores
In the ‘Pairing Score Prediction Component’, we employ wide&deep learning [Cheng et al., 2016]. The layer is divided into a wide layer and a deep layer. In the deep layer, two j-dimensional learned representation vectors are concatenated and passed to another MLP that computes a joint representation of two ingredients. This joint representation is denoted as deep vector d and is mathematically expressed as follows:
Figure 2: Overview of our KitcheNette architecture
where , and jis the number of hidden units in each layer. In the wide layer, the outer product of two j-dimensional learned representation vectors is computed and a
matrix is generated. The matrix is then flattened to a
-dimensional wide vector w which is mathematically expressed as follows:
where g denotes a flattening function that converts a matrix into a
-dimensional single vector.
The wide vector w is then directly concatenated to the deep vector d. The concatenation of the wide and deep vectors is passed to the last fully connected layer to compute the pair score for the final output. Overall, as (w, d) is the concatenation of the wide and deep vectors, the final output score Y for the ingredient pair is mathematically calculated as follows:
where and
, and j is the numberof hidden units in each layer.
4.3 Model Training Details
We train our proposed model to minimize the loss function (Mean Squared Error) which can be expressed as follows:
where L is the computed loss function to be minimized during training, are the model parameters to be trained,
is the true score value,
is the predicted score value, and N is the total number of input pairs used for training. We use the Adam optimizer for our model.
5.1 Baseline Models
We first evaluated the baseline models before evaluating our proposed model. We first predicted the pairing scores by simply calculating the cosine similarity between two input ingredient vectors. We employed the following machine learning models from the Python Scikit-learn [Pedregosa et al., 2011] package as our baseline models: Linear Support Vector Regressor, Random Forest Regressor, Extra Tree Regressor, SGD Regressor, and Gradient Boosting. Additionally, a simple version of Siamese Neural Network [Koch et al., 2015] is used one of the baseline models. All these models are fitted with hyperparameters estimated by the built-in grid search.
5.2 Main Results
As illustrated in Table 3, the following five metrics were utilized to evaluate model performance: Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), Correlation (CORR), and R squared (R2). Our KitcheNette model clearly outperforms the baseline models in all metrics.
We use Normalized Discounted Cumulative Gain (NDCG@K) to evaluate the ranking performance of our model (Figure 3a) and employ ROC curve to evaluate the sensitivity of our model (Figure 3b). In terms of NDCG@K, our model outperforms all the baseline models in making accurate predictions. The ROC curve is also used to measure the classification performance of the models. As mentioned in Section 3.2, we regarded all pairings with prediction scores of 0.274 or higher as complementary pairings; pairings with lower scores were considered non-complementary. The ROC curve results demonstrate that our KitcheNette model achieves higher performance than all the other models in predicting complementary pairings.
5.3 Ablation Study
We performed ablation tests to evaluate each feature of KitcheNette. As illustrated in Table 4, the wide&deep architecture and ingredient embedding of our model help improve its performance. When our model uses the cosine similarity of learned representations from the Siamese networks, it obtains the lowest performance in predicting food pairing scores. The concatenation (deep layer) of two representations dramatically improves the performance of our model. This indicates that semantic relations need to be learned for predicting food pairing scores. Furthermore, the wide&deep architecture that learns the relation of two ingredients further boosts our model’s performance. Also, utilizing the ingredient embedding for input vectors, instead of randomly initialized vectors, improves the model’s performance.
For qualitative analysis, we considered to perform experiments with actual food and get human feedback, but realized it was not easy to evaluate large-scale pairing scores and beyond scope of our work. Instead, we performed various case studies. On top of that, we provide a demo version3 of KitcheNette
Table 3: Prediction results of the models.
Figure 3: Additional Model Prediction Results.
Table 4: Ablation tests on the validation set.
where anyone could retrieve the scores of ingredient pairings that they want to find.
6.1 Finding Unknown Pairings
To demonstrate the accuracy of KitcheNette’s predictions on infrequently used food ingredient pairings, we performed a comparative analysis of the prediction results of both known and unknown ingredient pairings. As illustrated in Table 5, we chose three similar carbonated white wines (champagne, sparkling wine, and prosecco). We then calculated the score of each wine paired with a different ingredient and analyzed all the possible pairings for the three different cases below.
Case 1 We used the champagne&orange twist as a given pairing for comparison since it is a well-known ingredient pairing with a high annotated score. The orange twist&sparkling wine and orange twist&prosecco pairings do not have annotated scores since they are uncommon pairings. Additionally, we chose two other ingredients (orange wedge and lime twist) that are similar to orange twist, but are not frequently used with any of the three wines. As a result, we have one known pairing and eight unknown pairings. The prediction results of all the nine pairings were consistently high (0.33-0.45).
Table 5: Examples of known and unknown pairings. the predicted scores of known and unknown pairings, respectively
Case 2 Based on their prediction scores, we paired the wines with different ingredients to create the following three unique known pairings: champagne&elderflower liqueur, sparkling wine&cream de cassis, and prosecco&lemon sorbet). The prediction results of the remaining six unknown pairings were also consistently high (0.29-0.42) compared to the three given known pairings.
Case 3 Finally, we chose onion as it made the worst known pairing with champagne and paired it with the remaining two wines. The predictions results of the two unknown pairings (sparkling wine&onion and prosecco&onion) were consistently as low as the scores of the known pairing (champagne&onion).
In sum, our prediction results show that KitcheNette is capable of making predictions on certain pairings based on analogical reasoning which states that if A is similar to B and A forms a good pairing with C, then it is more likely that B also forms a good pairing with C. We believe that using this reasoning enhanced the performance of KitcheNette on unknown food pairings without annotated scores.
6.2 Comparison of Food Pairing Ranking Results
We performed a comparative analysis between the ingredients ranked by KitcheNette and the ranked ingredients in FlavorDB4. We selected four widely used food ingredients (tomato, onion, pepper and cinnamon) to . Then we retrieved the top 10 ingredient pairings that consist of the selected ingredients. Based on our observations, KitcheNette generally recommended food ingredients that are frequently used in everyday cooking and dining (e.g., tomato&lettuce, onion&ground beef, pepper&oregano, cin-
Table 6: The ranking results of KitchenNette and FlavorDB[Garg et al., 2017].
Table 7: Food&drink Pairings. The ranked pairings of KitchenNette and food&drink recommendations from “The Flavor Bible”[Page and Dornenburg, 2008] and “WHAT to DRINK with WHAT you EAT”[Dornenburg and Page, 2009]. The recommendations are listed in alphabetical order. refers to the predicted scores of unknown pairings.
namon&clove,apple). On the other hand, while FlavorDB recommended food ingredients that share a large number of chemical compounds with the selected ingredients, some of the recommendations did not pair well with the selected ingredients (e.g., tomato&tea, onion&cocoa, pepper&orange) for cooking and dining.
6.3 Discovering New Drink-Food Pairings
We also found that KitcheNette can discover new food-drink pairings, which we believe is one of the main aims of food pairing. As illustrated in Table 7, we compared our model’s food&drink recommendations with those from “The Flavor Bible” [Page and Dornenburg, 2008] and “WHAT to DRINK with WHAT you EAT” [Dornenburg and Page, 2009]. We found that KitcheNette not only provides recommendations that are consistent with the recommendations of culinary experts from the books. books but also recommends far more pairings than the two books.
For red wine and white wine, our model recommended a variety of meat (e.g., beef, lamb) and specific seafood ingredients (e.g., mussel, lobster, shrimp) respectively. Our model recommended authentic Japanese food ingredients to pair with sake, which shows that our data-driven learning model is also able to recommend food ingredients less common in non-Asian
cuisines.
In this work, we introduced KitchenNette which predicts food ingredient pairing scores based on a large amount of food recipe data, and ranks food ingredient pairings based on the predicted scores. Our model which has Siamese deep neural networks is trained on dataset containing more than 300k food ingredient pairing scores. We demonstrate that our model discovers new and unknown pairings and achieves better ranking results than the existing food pairing ranking models. Also, our model discovers new drink-food pairings and accurately predicts the scores of new food ingredient pairings.
For future work, we plan to use a graph-based neural network architecture to train on one-to-many ingredient pairings, instead of on one-to-one pairings, which were used by our model’s Siamese networks. Also, we plan to add the chemical information of food ingredients to the ingredient embeddings and use more detailed information on food ingredients from food encyclopedias. Last, we would like to use more novel and authentic recipes to help our model to recommend more versatile food ingredient pairings.
[Ahn and Ahnert, 2013] Yong-Yeol Ahn and Sebastian Ahnert. The flavor network. Leonardo, 46(3):272–273, 2013.
[Ahn et al., 2011] Yong-Yeol Ahn, Sebastian E Ahnert, James P Bagrow, and Albert-L´aszl´o Barab´asi. Flavor network and the principles of food pairing. Scientific reports, 1:196, 2011.
[Bouma, 2009] Gerlof Bouma. Normalized (pointwise) mutual information in collocation extraction. Proceedings of GSCL, pages 31–40, 2009.
[Cheng et al., 2016] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, pages 7–10. ACM, 2016.
[Dornenburg and Page, 2009] Andrew Dornenburg and Karen Page. What to Drink with What You Eat: The Definitive Guide to Pairing Food with Wine, Beer, Spirits, Coffee, Tea-Even Water-Based on Expert Advice from America’s Best Sommeliers. Little, Brown, 2009.
[Garg et al., 2017] Neelansh Garg, Apuroop Sethupathy, Rudraksh Tuwani, Shubham Dokania, Arvind Iyer, Ayushi Gupta, Shubhra Agrawal, Navjot Singh, Shubham Shukla, Kriti Kathuria, et al. Flavordb: a database of flavor molecules. Nucleic acids research, 46(D1):D1210–D1216, 2017.
[Grace and Maher, 2016] Kazjon Grace and Mary Lou Maher. Surprise-triggered reformulation of design goals. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[Grace et al., 2016] Kazjon Grace, Mary Lou Maher, David C Wilson, and Nadia A Najjar. Combining cbr and deep learning to generate surprising recipe designs. In International Conference on Case-Based Reasoning, pages 154– 169. Springer, 2016.
[Koch et al., 2015] Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML Deep Learning Workshop, volume 2, 2015.
[Liang et al., 2018] Guoxi Liang, Byung-Won On, Dongwon Jeong, Hyun-Chul Kim, and Gyu Choi. Automated essay scoring: A siamese bidirectional lstm neural network architecture. Symmetry, 10(12):682, 2018.
[Marin et al., 2018] Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. Recipe1m: A dataset for learning cross-modal embeddings for cooking recipes and food images. arXiv preprint arXiv:1810.06553, 2018.
[Mikolov et al., 2013] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
[Mueller and Thyagarajan, 2016] Jonas Mueller and Aditya Thyagarajan. Siamese recurrent architectures for learning sentence similarity. In AAAI, volume 16, pages 2786–2792, 2016.
[Page and Dornenburg, 2008] Karen Page and Andrew Dornenburg. The flavor bible: The essential guide to culinary creativity, based on the wisdom of America’s most imaginative chefs. Little, Brown, 2008.
[Pedregosa et al., 2011] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
[Salvador et al., 2017] Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, and Antonio Torralba. Learning cross-modal embeddings for cooking recipes and food images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[Simas et al., 2017] Tiago Simas, Michal Ficek, Albert DiazGuilera, Pere Obrador, and Pablo R Rodriguez. Foodbridging: a new network construction to unveil the principles of cooking. Frontiers in ICT, 4:14, 2017.
[Teng et al., 2012] Chun-Yuen Teng, Yu-Ru Lin, and Lada A Adamic. Recipe recommendation using ingredient networks. In Proceedings of the 4th Annual ACM Web Science Conference, pages 298–307. ACM, 2012.
[Yuan et al., 2018] Huiru Yuan, Guannan Liu, Hong Li, and Lihong Wang. Matching recommendations based on siamese network and metric learning. In 2018 15th International Conference on Service Systems and Service Management (ICSSSM), pages 1–6. IEEE, 2018.