Wednesday, February 10, 2010

Collating (not reducing) with CouchDB List Functions

‹prev | My Chain | next›

My ingredient index page is a sorry excuse for a CouchDB map-reduce.

Even when I first implemented it, I received feedback from concerned CouchDB aficionados about my approach. At the time, I let it slide because, well, it worked. After upgrading to CouchDB 0.10, it no longer worked, but I found a workaround through a configuration setting. That spurred feedback from another horrified CouchDB aficionado, offering an alternative approach—using CouchDB list functions.

List functions are a mechanism for iterating over rows in a view to produce output. CouchDB list functions are typically used to generate alternate formats for output (Atom, XML, HTML, etc.). I still want to generate JSON for consumption by my Sinatra application. Hopefully, that will not prove difficult.

First up, I remove the old reduce by removing it from my app and reloading the app's design docs:
strom@whitefall:~/repos/eee-code$ rm couch/_design/recipes/views/by_ingredients/reduce.js 
cstrom@whitefall:~/repos/eee-code$ rake couchdb:load_design_docs
(in /home/cstrom/repos/eee-code)
The rake task is a thin wrapper around the couch_docs gem, which assembles this and any other javascript files under couch/_design/recipes into a recipes design document.

With the reduce gone from the design document, I can now try to replace it with a list function (which also goes in the design document, albeit in a different section). Since this is my first attempt at a such a beast, I prototype to learn. I create couch/_design/recipes/lists/index-ingredients.js. The couch_docs gem will assemble the contents of that file into the recipes design document like this:
{
"_id": "_design/recipes",
"_rev": "1-7e06eea8045779c50e28a658cfc8b639",
"lists": {
"index-ingredients": "function(head, req){// list function code here}"
}
// other design document stuff here
}
My first attempt at "list function code here" looks like this:
function(head, req){
var row;
while(row = getRow()) {
log(row.key);
send(toJSON(row.value));
}
}
This is very similar to the examples given for list functions in the CouchDB book. Mostly, I am just trying to get my bearings. For each row in the view against which this list function will be applied, I will log the rows value and send the row as JSON to the requester.

After loading this list function, I can apply it to the by_ingredient map (key is an ingredient, value is a hash of the document's ID and title) via curl:
 cstrom@whitefall:~/repos/eee-code$ curl http://localhost:5984/eee/_design/recipes/_list/index-ingredients/by_ingredients?limit=2
{"id": "2002-02-12-steak","title": "Chipotle Mushroom Steak"}{"id": "2003-03-21-fish","title": "Grilled Jerk Swordfish"}
That is not valid JSON being returned to curl, but it does suffice as proof of concept.

To replace my poor reduce, I need to collect all recipes by ingredient. No problem, the map view returns all ingredients in all recipes, ordered by ingredient:
cstrom@whitefall:~/repos/eee-code$ curl http://localhost:5984/eee/_design/recipes/_view/by_ingredients?limit=11
{"total_rows":4892,"offset":0,"rows":[
{"id":"2002-02-12-steak","key":"adobo sauce","value":{"id":"2002-02-12-steak","title":"Chipotle Mushroom Steak"}},
{"id":"2003-03-21-fish","key":"allspice","value":{"id":"2003-03-21-fish","title":"Grilled Jerk Swordfish"}},
{"id":"2004-06-01-salad","key":"almonds","value":{"id":"2004-06-01-salad","title":"Strawberry and Orange Salad"}},
{"id":"2006-06-14-granola","key":"almonds","value":{"id":"2006-06-14-granola","title":"Daughter's Granola"}},
{"id":"2001-11-27-quesadillas","key":"ancho chile pepper","value":{"id":"2001-11-27-quesadillas","title":"Baked Quesadillas"}},
{"id":"2001-10-30-caesar_salad","key":"anchovies","value":{"id":"2001-10-30-caesar_salad","title":"Caesar Salad"}},
{"id":"2003-02-23-sauce","key":"anchovies","value":{"id":"2003-02-23-sauce","title":"Spicy Anchovy, Garlic, and Oil Sauce"}},
{"id":"2003-08-26-salad","key":"anchovies","value":{"id":"2003-08-26-salad","title":"Composed Tuna Salad"}},
{"id":"2003-11-25-caesar_salad","key":"anchovies","value":{"id":"2003-11-25-caesar_salad","title":"Caesar Salad"}},
{"id":"2005-03-19-pasta","key":"anchovies","value":{"id":"2005-03-19-pasta","title":"Pasta Puttanesca"}},
{"id":"2002-09-03-annatto_shrimp","key":"annatto seeds","value":{"id":"2002-09-03-annatto_shrimp","title":"Annatto Grilled Shrimp and Vegetables"}}
]}
I just need my list function to collate them so that there is only one row for almonds (not two), one row for anchovies (not five). After some mucking, I end up with this SAX-y (SAJ-y?) looking thing:
function(head, req){
var row, last_key, ingredient_list;
send('[');
while(row = getRow()) {
if (last_key != row.key) {
if (typeof(last_key) != 'undefined') {
if (ingredient_list.length < 100) {
send(toJSON({key:last_key, value:ingredient_list}));
send(',');
}
}
last_key = row.key;
ingredient_list = [];
}
ingredient_list.push(row.value);
}
if (ingredient_list.length < 100) {
send(toJSON({key:last_key, value:ingredient_list}));
}
else {
send('{"key":"","value":[]}');
}
send(']');
}
I send back an array (not really JSON, but it will only be consumed by my Sinatra app). I could build the entire data structure in memory and perform one big send at the end. This solution feels more in the spirit of list functions. Specifically, I output data as soon as it is ready—as soon as a new ingredient is being processed (anchovies instead of almonds).

Testing this out with curl, I find:
cstrom@whitefall:~/repos/eee-code$ curl http://localhost:5984/eee/_design/recipes/_list/index-ingredients/by_ingredients?limit=11
[{"key": "adobo sauce","value": [{"id": "2002-02-12-steak","title": "Chipotle Mushroom Steak"}]},
{"key": "allspice","value": [{"id": "2003-03-21-fish","title": "Grilled Jerk Swordfish"}]},
{"key": "almonds","value": [{"id": "2004-06-01-salad","title": "Strawberry and Orange Salad"},{"id": "2006-06-14-granola","title": "Daughter's Granola"}]},
{"key": "ancho chile pepper","value": [{"id": "2001-11-27-quesadillas","title": "Baked Quesadillas"}]},
{"key": "anchovies","value": [{"id": "2001-10-30-caesar_salad","title": "Caesar Salad"},{"id": "2003-02-23-sauce","title": "Spicy Anchovy, Garlic, and Oil Sauce"},{"id": "2003-08-26-salad","title": "Composed Tuna Salad"},{"id": "2003-11-25-caesar_salad","title": "Caesar Salad"},{"id": "2005-03-19-pasta","title": "Pasta Puttanesca"}]},
{"key": "annatto seeds","value": [{"id": "2002-09-03-annatto_shrimp","title": "Annatto Grilled Shrimp and Vegetables"}]}]
Indeed, my almond and anchovy recipes are now properly collated.

More importantly, after switching my Sinatra app to consume this list function rather than the now deleted reduce, I still have my by-ingredient Cucumber scenarios passing:
cstrom@whitefall:~/repos/eee-code$ cucumber features/ingredient_index.feature
Feature: Ingredient index for recipes

As a user curious about ingredients or recipes
I want to see a list of ingredients
So that I can see a sample of recipes in the cookbook using a particular ingredient

Scenario: A couple of recipes sharing an ingredient # features/ingredient_index.feature:7
Given a "Cookie" recipe with "butter" and "chocolate chips" # features/step_definitions/ingredient_index.rb:1
And a "Pancake" recipe with "flour" and "chocolate chips" # features/step_definitions/ingredient_index.rb:1
When I visit the ingredients page # features/step_definitions/ingredient_index.rb:43
Then I should see the "chocolate chips" ingredient # features/step_definitions/ingredient_index.rb:47
And "chocolate chips" recipes should include "Cookie" and "Pancake" # features/step_definitions/ingredient_index.rb:52
And I should see the "flour" ingredient # features/step_definitions/ingredient_index.rb:47
And "flour" recipes should include only "Pancake" # features/step_definitions/ingredient_index.rb:59

Scenario: Scores of recipes sharing an ingredient # features/ingredient_index.feature:17
Given 120 recipes with "butter" # features/step_definitions/ingredient_index.rb:22
When I visit the ingredients page # features/step_definitions/ingredient_index.rb:43
Then I should not see the "butter" ingredient # features/step_definitions/ingredient_index.rb:64

2 scenarios (2 passed)
10 steps (10 passed)
0m4.478s
That is a good stopping point for tonight. Up tomorrow: Sinatra 1.0.

Day #10

No comments:

Post a Comment