Start
We all need to eat to live, but how many of us live to eat?
"Overweight and obesity are defined as abnormal or excessive fat accumulation that may impair health."
      - World Health Organization (WHO)
A Body Mass Index (BMI) over 25 is defined as overweight and a BMI over 30 is defined as obese. In 2016, 39% of adults worldwide were overweight and 13% were obese. WHO considers obesity an epidemic and calculated in 2017 that 2.8 million people die every year due to being overweight or obese. Obesity is also considered an easily preventable epidemic by the WHO, as it in most cases can be prevented by having a healthy diet and doing physical exercise.
In this analysis, packaged food products from two countries will by analysed by investigating their nutritional information. Can one observe a relation between the nutrients in a country’s food and their current health status according the health indices given by the WHO? That is what we’re going to see today.
Our Contestants
Our first contestant brought us the Eiffel Tower and the riviera. With an average BMI of 25.6, one of the best in Europe, France can be considered the favourite of this competition, but can it stand the pressure?
With noted overweight among 67,9% of the population and an average BMI of 29,1 our second contestant is the big burger-loving country on the other side of the Atlantic Ocean, United States of America.
Statistic | France | USA |
---|---|---|
Overweight among adults | 59.5% | 67.9% |
Obesity among adults | 23.2% | 37.3% |
Overweight among children and adolescents 5-19 years | 28.9% | 41.2% |
Obesity among children and adolescents 5-19 years | 8.1% | 21.4% |
Mean Body Mass Index adults | 25.6 | 29.1 |
Mean Body Mass Index 5-19 years | 19.5 | 21.5 |
Blood glucose ≥126 mg/dl (7.0 mmol/l) | 5.9% | 7.3% |
Data
The dataset used to investigate the nutritional ingredients in the products was the Open Food Facts Database. The database is a non-profit open database, which means that anyone all over the world can use the data and contribute to the database. Within the database, 459 371 products were labelled with France and 175 661 products labelled with the US. The data was extracted with respect to the labels into two subsets. Since the French data set was significantly larger, it was randomly downsampled to consist of the same amount of entries as the US dataset.
During the analysis, features regarding nutriments, such as energy per 100g and sugars per 100g, as well as product names and food categories were investigated. Names and food categories were used to categorize the data to compare nutritional values between the two countries with products in the same category. This was to make sure the comparisons would make sense: comparing the sugar content of a lollipop from the US with the sugar content of a soup from France would not give a true image of the data.
While the open nature of the database generates a lot of data, the contributions are, as of yet, not automatically checked. This generates a lot of missing values in the dataset, as well as errors and inconsistencies. To give both our contestants the best and equal opportunities the data was therefore cleaned as follows:
- Rows was removed where no product name or product category was defined
- Nutriments with values above 100g out of 100g was removed
The last point was important because no nutriment can be more than a 100% of a product. Since each nutriment is defined per 100g, we set that the range should be between 0 and a 100. The only nutriment feature we did not do this for was energy_100g, as energy when given in Kj can be more than 100 per 100g.
Food categories
This competition will consist of 5 rounds that each investigates one food category. The five categories are:
- Fats
- Meat, poultry, fish
- Dairy
- Bread
- Snacks
Some of these categories are divided in to subcategories as well, for example the category Snacks is divided in to the subcategories Sugary snacks and Salty snacks. For each category, we have chosen different nutriments to focus on, such as sugars per 100g for Sugary snacks.
The products were sorted into the categories by their label, product name and ingredient list. Let's look at the category Fats as an example. First, words were defined that are associated with fats, such as butter and oil. Both English and French words were used to make sure that as many matches as possible were found. Then, both datasets were queried to see the results of the filtering. If the results contained unwanted products, a list of words that were not part of the category was defined. This list was then used to filter away the wrongly classified data. As an example, “butter popcorn” was initially sorted to the fats category since it contains the word “butter”. However, “butter popcorn” is not a type of a fat and should not be part of the fats category. Therefore the word “popcorn” was added to the “not a fat word” list.
But enough with details, let’s start the competition. May the best food country win!