Zack Scholl

zack.scholl@gmail.com

Consensus cookery

 / #software 

Sometimes when I want a recipe to cook something new I will find several recipes for the same thing and try to use them as a guide to generate an average or “consensus” recipe. This code should make it easy to generate consensus recipes (useful!) and also show variation between recipes (interesting!).

Finding a consensus recipe requires first clustering many recipes. This is because a single recipe (e.g. a recipe for brownies) might have many significant variations (e.g. brownies can have just cocoa, just chocolate, or both). This code will first cluster recipes and then use the clusters to deliver the consensus recipe.

Example

The quick-and-dirty implementation goes like this:

  1. Choose a recipe (e.g. brownies, crepes, pancakes).
  2. Search using duckduckgo.com to find hundreds of corresponding recipes (fetch_urls.js).
  3. Download all the recipes and use pandoc to convert to text for processing.
  4. Use a really simple (read: bad) context-extractor to grab ingredients.
  5. Cluster the recipes based on the presence of ingredients.
  6. Take the median values for ingredients in a given cluster to create an average recipe.

The context-extractor works by finding the most likely “ingredient” section in the web page and then trying to parse those ingredients using a greedy search from a list of likely ingredients (top_5k.txt). Its not a great implementation. However, the errors in it are pretty random, which means you can get okay results as long as you have ~hundreds of recipes.

The median values are used, rather than the mean, so its less susceptible to bad parsing of the quantity. Again, as long as the parser is okay, it should be accurate enough.

Here’s some examples of running the code (check out the code on Github).

Brownies

As mentioned, brownies are sometimes made with cocoa, sometimes chocolate, and sometimes both. Interestingly the machine learning automatically detects this.

Here’s the biggest “brownie” cluster which shows ingredients for a consensus recipe made with chocolate (made up of 44 recipes). The Rel. Freq. corresponds to the percentage of recipes that contain that ingredient.

cluster 0 (n=44)
+------------+-------------+------------+
| Ingredient |    Amount   | Rel. Freq. |
+------------+-------------+------------+
|   butter   |  4 1/2 tbsp |     98     |
| chocolate  |  4 1/2 tbsp |     93     |
|    eggs    | 1 5/8 whole |     93     |
|   flour    |  6 3/4 tbsp |     80     |
|    salt    |   1/4 tsp   |     50     |
|   sugar    |   3/4 cup   |     91     |
|  vanilla   |   1/2 tsp   |     70     |
+------------+-------------+------------+

The next biggest cluster shows ingredients for a brownie recipe that is made with cocoa powder. (Also it uses baking powder unlike the previous recipe).

cluster 11 (n=28)
+---------------+------------+------------+
|   Ingredient  |   Amount   | Rel. Freq. |
+---------------+------------+------------+
| baking powder |  1/4 tsp   |     86     |
|     cocoa     | 3 1/4 tbsp |     71     |
|      eggs     | 1.0 whole  |     57     |
|     flour     |  5.0 tbsp  |     93     |
|      salt     |  1/4 tsp   |     79     |
|     sugar     | 7 1/4 tbsp |     93     |
|    vanilla    |  1/4 tsp   |     68     |
+---------------+------------+------------+

The third biggest cluster shows ingredients for a brownie recipe that uses both chocolate and cocoa.

cluster 4 (n=28)
+-------------+-------------+------------+
|  Ingredient |    Amount   | Rel. Freq. |
+-------------+-------------+------------+
| brown sugar |   6.0 tbsp  |    100     |
|    butter   |  6 3/4 tbsp |    100     |
|  chocolate  |  6 1/2 tbsp |     89     |
|    cocoa    |  5 3/4 tbsp |     54     |
|     eggs    | 1 7/8 whole |    104     |
|    flour    |   1/2 cup   |     89     |
|     salt    |   3/8 tsp   |     86     |
|    sugar    |   1/2 cup   |    100     |
|   vanilla   |   3/8 tsp   |    100     |
+-------------+-------------+------------+

You may notice that the proportions are odd (1 7/8 eggs!) which is because the program tries to normalize the recipes to a specified volume, and then converts them back to the median volume in all the recipe cluster.

Pancakes

The machine learning clustering highlights the major difference between pancakes - whether they are buttermilk or not. These are the first two biggest clusters, where the first one has milk and the second uses buttermilk.

cluster 15 (n=33)
+---------------+-------------+------------+
|   Ingredient  |    Amount   | Rel. Freq. |
+---------------+-------------+------------+
| baking powder |   1/8 tsp   |    100     |
|     butter    |   1/2 tsp   |    103     |
|      eggs     | 1 1/4 whole |     97     |
|     flour     |  1 1/4 cup  |    100     |
|      milk     |  1 1/8 cup  |     94     |
|      salt     |   1/2 tsp   |     94     |
|     sugar     |  1 1/2 tsp  |    100     |
+---------------+-------------+------------+
cluster 14 (n=29)
+---------------+-------------+------------+
|   Ingredient  |    Amount   | Rel. Freq. |
+---------------+-------------+------------+
| baking powder |   5/8 tsp   |    100     |
|  baking soda  |   1/2 tsp   |     97     |
|     butter    |   1/2 tsp   |    100     |
|   buttermilk  |  1 1/4 cup  |     97     |
|      eggs     | 1 1/8 whole |     97     |
|     flour     |  1 1/8 cup  |    100     |
|      salt     |   3/8 tsp   |     90     |
|     sugar     |  1 1/4 tsp  |    103     |
|    vanilla    |   5/8 tsp   |     41     |
+---------------+-------------+------------+

Homemade noodles

The machine learning clustering picks up on an important distinction within noodle making - whether to use semolina or flour.

cluster 18 (n=24)
+------------+-------------+------------+
| Ingredient |    Amount   | Rel. Freq. |
+------------+-------------+------------+
|    eggs    | 2 1/2 whole |     83     |
|   flour    |  2 3/8 cup  |    129     |
|    salt    |   5/8 tsp   |     75     |
|   water    |  6 3/8 tbsp |    100     |
+------------+-------------+------------+
cluster 14 (n=16)
+------------+------------+------------+
| Ingredient |   Amount   | Rel. Freq. |
+------------+------------+------------+
|    eggs    | 2.0 whole  |    112     |
|   flour    | 1 3/8 cup  |    119     |
| olive oil  | 2 7/8 tsp  |     94     |
|    salt    |  5/8 tsp   |     75     |
|  semolina  |  1.0 cup   |     31     |
|   water    | 3 7/8 tbsp |     75     |
+------------+------------+------------+

Hamburger

Here’s a funny thing. If you are not too specific about the recipe you want, you might get clusters of truly different recipes. Consider the hamburger.

The biggest cluster for hamburger is obviously a list of ingredients for a hamburger recipe albeit the proportions are off (you can just multiple the amounts by some factor).

+----------------------+------------+------------+
|      Ingredient      |   Amount   | Rel. Freq. |
+----------------------+------------+------------+
|         beef         | 3 5/8 tbsp |     87     |
|         eggs         | 3/8 whole  |     33     |
|        garlic        | 6 1/2 tbsp |     77     |
|        onion         | 4 1/8 tbsp |     50     |
|         salt         |  1/4 tsp   |     40     |
| worcestershire sauce |  1.0 tsp   |     47     |
+----------------------+------------+------------+

Interestingly, one of the next biggest clusters is not a hamburger - it has no meat in it! Looking at it closer though it is obviously a hamburger bun recipe, which the machine learning clustering automatically detected. Lol.

+------------+-----------+------------+
| Ingredient |   Amount  | Rel. Freq. |
+------------+-----------+------------+
|   butter   |  1.0 tsp  |     53     |
|    eggs    | 7/8 whole |     79     |
|   flour    | 2 3/8 cup |     95     |
|    milk    |  3/4 cup  |     37     |
|    salt    |  7/8 tsp  |     74     |
|   sugar    | 3 1/8 tsp |     95     |
|   water    |  5/8 cup  |     79     |
|   yeast    |  5/8 tsp  |     79     |
+------------+-----------+------------+

Try it

I decided to try out my software in the real world. What would one of these average recipes taste like? To see, I computed the average recipes for “chocolate chip cookies” and took the second largest cluster because it had both baking powder and baking soda.

The computed average chocolate chip cookie recipe:

Ingredient Amount Variation Rel. Freq.
baking powder 1 tsp ± 1 3/8 95
baking soda 3/4 tsp ± 3/8 75
brown sugar 7/8 cup ± 3/8 99
butter 5/8 cup ± 1/2 97
chocolate 1 cup ± 5/8 109
eggs 2 whole ± 3/4 105
flour 1 1/4 cup ± 7/8 116
salt 5/8 tsp ± 1/2 86
sugar 3/8 cup ± 1/4 100
vanilla 1 5/8 tsp ± 3 1/8 100

I used my standard techniques for baking to mix up the ingredients - first mixing wet and then adding dry ingredients and then baking for 10-15 minutes at 350F. They turned out to be much more like cake than cookies. Apparently there was too much baking powder and the ratio of liquid to dry ingredients was too high. They also tasted too sugary. They weren’t bad, but they weren’t great, so I think they would qualify as average cookies.

Average cookies I made from my code results

I think part of the problem was that I had trouble converting ingredients to volumes for normalization. Some recipes dictate their recipes in “grams” or “ounces” which need to be converted to volume using the density. In this version I used a constant density for everything (0.9 g / ml) which was somewhat between the density for flour and water. However the density for flour (0.6 g / ml) is much lower than the density for water (1 g / ml) and butter (0.9 g / ml).

When I modified the densities, it indeed changed the flour to 1 3/4 cup instead of 1 1/4 cup, and reduced the variation from 7/8 cup to 1/2 cup. Next time I think I’d like to make the biggest cluster - i.e. the most popular recipe, which doesn’t use baking powder. Here’s that recipe:

Ingredient Amount Variation Rel. Freq.
baking soda 7/8 tsp ± 3/8 97
brown sugar 3/4 cup ± 1/4 91
butter 3/4 cup ± 3/8 99
chocolate 1 3/8 cup ± 5/8 105
eggs 2 whole ± 1/2 103
flour 2 cup ± 1/2 96
salt 5/8 tsp ± 3/8 89
sugar 1/2 cup ± 1/4 94
vanilla 1 1/4 tsp ± 2 1/8 98

In this case the flour seems a lot more reasonable too (2 cups). I’d be interested in trying this recipe, instead.

If you’d like to generate your own average recipes, check out the source on Github.