Skip to content

415matt/Recipe-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Introduction:

What interests me most is how the properties of the dish - (nutritional info, prep time, etc) effects how users rate the recipe.

I personally have a tendency to enjoy foods high in sugars & fats, and was curious to learn: Does the amount of sugar in a recipe influence the rating of the dish? During my journey to answering this question, I also discovered correlations between certain nutritional contents and overall calories.

This dataset as well as my analysis provides insights into the shared activity of creating meals that humans have been participating in for millions of years. Because recipes have been repeated to note their corresponding rating/review, this dataset contains 234,429 rows and 23 columns. Some columns of interest include 'ratings'-(floats between 0-5), 'calories'-(shown as floats), 'sugar (PDV)'-(Amount of Sugar in the dish represented by percent of daily value).


Cleaning & Exploratory Data Analysis


The Dataset I am working with is a result of two separate datasets - A `recipes` dataset and a `reviews` dataset containing reviews for the recipes. These datasets were merged together based on the recipe id, resulting in duplicate recipe rows for each review, something that I had to keep in mind while working with the data.

Once merged, I replaced the recipes with a rating of '0' with null values, as the lowest you can give a recipe is 1 star. The only way a recipe could be rated '0' would be if the review didn't include a rating or the recipe has no reviews. If the rating doesn't exist, a null value will better represent the dish's rating and makes calculating the average rating of a recipe more accurate as the 0's won't drag down the average.

After calculating the 'avg rating' column, I got to work converting columns into their proper values. Dates into DateTime objects, Strings representing lists into actual lists, and dropping the duplicate ID column. Finally, to make my life easier when answering my question, I changed the 'nutrition' column containing various nutritional info about the recipe in a list into their own separate columns: 'total fat (PDV)', 'sugar (PDV)', 'sodium (PDV)', 'protein (PDV)', 'saturated fat (PDV)', 'carbohydrates (PDV)'.



Let's take a look at our cleaned dataframe!

name id minutes contributor_id submitted tags n_steps steps description ingredients n_ingredients user_id date rating review avg rating calories (#) total fat (PDV) sugar (PDV) sodium (PDV) protein (PDV) saturated fat (PDV) carbohydrates (PDV)
0 1 brownies in the world best ever 333281 40 985201 2008-10-27 [60-minutes-or-less, time-to-make, course, main-ingredient, preparation, for-large-groups, desserts, lunch, snacks, cookies-and-brownies, chocolate, bar-cookies, brownies, number-of-servings] 10 ['heat the oven to 350f and arrange the rack in the middle', 'line an 8-by-8-inch glass baking dish with aluminum foil', 'combine chocolate and butter in a medium saucepan and cook over medium-low heat , stirring frequently , until evenly melted', 'remove from heat and let cool to room temperature', 'combine eggs , sugar , cocoa powder , vanilla extract , espresso , and salt in a large bowl and briefly stir until just evenly incorporated', 'add cooled chocolate and mix until uniform in color', 'add flour and stir until just incorporated', 'transfer batter to the prepared baking dish', 'bake until a tester inserted in the center of the brownies comes out clean , about 25 to 30 minutes', 'remove from the oven and cool completely before cutting'] these are the most; chocolatey, moist, rich, dense, fudgy, delicious brownies that you'll ever make.....sereiously! there's no doubt that these will be your fav brownies ever for you can add things to them or make them plain.....either way they're pure heaven! [bittersweet chocolate, unsalted butter, eggs, granulated sugar, unsweetened cocoa powder, vanilla extract, brewed espresso, kosher salt, all-purpose flour] 9 386585.0 2008-11-19 4.0 These were pretty good, but took forever to bake. I would send it ended up being almost an hour! Even then, the brownies stuck to the foil, and were on the overly moist side and not easy to cut. They did taste quite rich, though! Made for My 3 Chefs. 4.0 138.4 10.0 50.0 3.0 3.0 19.0 6.0
1 1 in canada chocolate chip cookies 453467 45 1848091 2011-04-11 [60-minutes-or-less, time-to-make, cuisine, preparation, north-american, for-large-groups, canadian, british-columbian, number-of-servings] 12 ['pre-heat oven the 350 degrees f', 'in a mixing bowl , sift together the flours and baking powder', 'set aside', 'in another mixing bowl , blend together the sugars , margarine , and salt until light and fluffy', 'add the eggs , water , and vanilla to the margarine / sugar mixture and mix together until well combined', 'add in the flour mixture to the wet ingredients and blend until combined', 'scrape down the sides of the bowl and add the chocolate chips', 'mix until combined', 'scrape down the sides to the bowl again', 'using an ice cream scoop , scoop evenly rounded balls of dough and place of cookie sheet about 1 - 2 inches apart to allow for spreading during baking', 'bake for 10 - 15 minutes or until golden brown on the outside and soft & chewy in the center', 'serve hot and enjoy !'] this is the recipe that we use at my school cafeteria for chocolate chip cookies. they must be the best chocolate chip cookies i have ever had! if you don't have margarine or don't like it, then just use butter (softened) instead. [white sugar, brown sugar, salt, margarine, eggs, vanilla, water, all-purpose flour, whole wheat flour, baking soda, chocolate chips] 11 424680.0 2012-01-26 5.0 Originally I was gonna cut the recipe in half (just the 2 of us here), but then we had a park-wide yard sale, & I made the whole batch & used them as enticements for potential buyers ~ what the hey, a free cookie as delicious as these are, definitely works its magic! Will be making these again, for sure! Thanks for posting the recipe! 5.0 595.1 46.0 211.0 22.0 13.0 51.0 26.0
2 412 broccoli casserole 306168 40 50969 2008-05-30 [60-minutes-or-less, time-to-make, course, main-ingredient, preparation, side-dishes, vegetables, easy, beginner-cook, broccoli] 6 ['preheat oven to 350 degrees', 'spray a 2 quart baking dish with cooking spray , set aside', 'in a large bowl mix together broccoli , soup , one cup of cheese , garlic powder , pepper , salt , milk , 1 cup of french onions , and soy sauce', 'pour into baking dish , sprinkle remaining cheese over top', 'bake for 25 minutes or until cheese is lightly browned', 'sprinkle with rest of french fried onions and bake until onions are browned and cheese is bubbly , about 10 more minutes'] since there are already 411 recipes for broccoli casserole posted to "zaar" ,i decided to call this one #412 broccoli casserole.i don't think there are any like this one in the database. i based this one on the famous "green bean casserole" from campbell's soup. but i think mine is better since i don't like cream of mushroom soup.submitted to "zaar" on may 28th,2008 [frozen broccoli cuts, cream of chicken soup, sharp cheddar cheese, garlic powder, ground black pepper, salt, milk, soy sauce, french-fried onions] 9 29782.0 2008-12-31 5.0 This was one of the best broccoli casseroles that I have ever made. I made my own chicken soup for this recipe. I was a bit worried about the tsp of soy sauce but it gave the casserole the best flavor. YUM! \nThe photos you took (shapeweaver) inspired me to make this recipe and it actually does look just like them when it comes out of the oven. \nThanks so much for sharing your recipe shapeweaver. It was wonderful! Going into my family's favorite Zaar cookbook :) 5.0 194.8 20.0 6.0 32.0 22.0 36.0 3.0
3 412 broccoli casserole 306168 40 50969 2008-05-30 [60-minutes-or-less, time-to-make, course, main-ingredient, preparation, side-dishes, vegetables, easy, beginner-cook, broccoli] 6 ['preheat oven to 350 degrees', 'spray a 2 quart baking dish with cooking spray , set aside', 'in a large bowl mix together broccoli , soup , one cup of cheese , garlic powder , pepper , salt , milk , 1 cup of french onions , and soy sauce', 'pour into baking dish , sprinkle remaining cheese over top', 'bake for 25 minutes or until cheese is lightly browned', 'sprinkle with rest of french fried onions and bake until onions are browned and cheese is bubbly , about 10 more minutes'] since there are already 411 recipes for broccoli casserole posted to "zaar" ,i decided to call this one #412 broccoli casserole.i don't think there are any like this one in the database. i based this one on the famous "green bean casserole" from campbell's soup. but i think mine is better since i don't like cream of mushroom soup.submitted to "zaar" on may 28th,2008 [frozen broccoli cuts, cream of chicken soup, sharp cheddar cheese, garlic powder, ground black pepper, salt, milk, soy sauce, french-fried onions] 9 1196280.0 2009-04-13 5.0 I made this for my son's first birthday party this weekend. Our guests INHALED it! Everyone kept saying how delicious it was. I was I could have gotten to try it. 5.0 194.8 20.0 6.0 32.0 22.0 36.0 3.0
4 412 broccoli casserole 306168 40 50969 2008-05-30 [60-minutes-or-less, time-to-make, course, main-ingredient, preparation, side-dishes, vegetables, easy, beginner-cook, broccoli] 6 ['preheat oven to 350 degrees', 'spray a 2 quart baking dish with cooking spray , set aside', 'in a large bowl mix together broccoli , soup , one cup of cheese , garlic powder , pepper , salt , milk , 1 cup of french onions , and soy sauce', 'pour into baking dish , sprinkle remaining cheese over top', 'bake for 25 minutes or until cheese is lightly browned', 'sprinkle with rest of french fried onions and bake until onions are browned and cheese is bubbly , about 10 more minutes'] since there are already 411 recipes for broccoli casserole posted to "zaar" ,i decided to call this one #412 broccoli casserole.i don't think there are any like this one in the database. i based this one on the famous "green bean casserole" from campbell's soup. but i think mine is better since i don't like cream of mushroom soup.submitted to "zaar" on may 28th,2008 [frozen broccoli cuts, cream of chicken soup, sharp cheddar cheese, garlic powder, ground black pepper, salt, milk, soy sauce, french-fried onions] 9 768828.0 2013-08-02 5.0 Loved this. Be sure to completely thaw the broccoli. I didn't and it didn't get done in time specified. Just cooked it a little longer though and it was perfect. Thanks Chef. 5.0 194.8 20.0 6.0 32.0 22.0 36.0 3.0

You can see that I didn't convert the other lists into their own columns - this is because I didn't need them easily accessible to answer my question, and didn't want to add more columns to this already large dataframe.


Univariate Analysis: In order to account for outliers in the 'minutes' column, I chose to only graph data in the 95th percentile. This solution preserves outliers in my dataframe in case I want to work with them in the future, while still presenting the data clean and understandable.

<iframe src="assets/Recipes Time Distribution.html" width=800 height=600 frameBorder=0></iframe>

Notice that the time distribution is not smooth. This is because of the tendency of users to round their times. (eg instead of reporting a recipe takes 33 mins users submit 35 mins).


Bivariate Analysis: I wanted to see if there was a noticeable difference in the correlation between fat vs calories to protein vs calories. Like the previous graphs, I limited my data to within the 99.9th percentile in order to avoid extreme outliers while still keeping as much data as possible.

<iframe src="assets/total fat (PDV) vs Calories.html" width=800 height=600 frameBorder=0></iframe>

Here we see a significant positive correlation between the Percent Daily Value of Fat and the total calories of the dish.

<iframe src="assets/protein (PDV) vs Calories.html" width=800 height=600 frameBorder=0></iframe>

Here we see a noticeably more spread, however still a positive correlation between the Percent Daily Value of Protein and total calories of the dish.


I thought that these plots were interesting, as it shows that protein is more loosely correlated to calories compared to Fat. This makes sense, as fat is 9 calories per gram whereas protein is only 4 cal/g, making fat a larger contributor to total calories. There is also a surprisingly straight line lower limit that can be drawn in the total fat (PDV) vs Calories graph. I believe that this line would represent the minimum amount of calories a dish can be given its PDV of Fat.


Interesting Aggregates: One of the interesting aggregates I created was seeing the distribution of reviews by rating. From this aggregation, I saw that written reviews were mostly left for 'good' dishes (rated 4 or 5 stars) whereas people were less likely to take the time to write a bad review (dishes rated 1-3 stars).

rating Probability of writing a Review
1 0.0122382
2 0.0101011
3 0.0305892
4 0.159097
5 0.723592


Assessment of Missingness

  • NMAR Analysis: Out of all 3 columns containing missing values - ('rating', 'review', 'description') I believe that 'review' would be the most likely candidate to be Not Missing at Random (NMAR). This is because we do not have data on recipe popularity or views. Without knowing how popular a dish is or how many people have tried the recipe, we cannot accurately quantify people's likeliness to write a review. If we had this data, and it showed that the likelihood of writing a review depended on the number of views it received, we could change the classification to MAR, but right now there is not enough information to rule out NMAR.

  • Missingness Dependency
<iframe src="assets/Empirical Distribution of the TVD dependant on rating.html" width=800 height=600 frameBorder=0></iframe>

From this plot, we can conclude that it is unlikely that the missingness of 'review' depends on 'rating'. This is because the probability of seeing a value greater than our observed statistic is 0.836, which is far from our rejection cutoff at 0.05. We can also see visually that our observed statistic is well within the Empirical distribution.



Hypothesis Testing


What is the relationship between the sugar (PDV) and the average rating of recipes?

null: There is no relationship between sugar (PDV) and the average rating of recipes.

alternative: There is a relationship between sugar (PDV) and the rating of recipes.

In order to answer my question, I performed a permutation test. This is because I am limited to the data that I have and do not know the actual distribution of either sugar or ratings. For my test statistic, I split 'sugar (PDV)' down its median in order to classify a recipe as "high" in sugar or "low" in sugar. This made it easy to calculate TVD for the observed data as well as create an empirical distribution.

<iframe src="assets/hypothesis.html" width=800 height=600 frameBorder=0></iframe>

My P value of 0.0 caused me to reject the null hypothesis at a significance level of 0.05 in favor of the alternative: There exists a relationship between sugar and recipe rating.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages