MongoDBのmap/reduceを試してみる。

以前(http://d.hatena.ne.jp/stog/20100531/1275317576)作成したコレクションをmap/reduceを使って"type"別に集計してみる。

コレクションの中身はこんな感じ。

$ mongo mytest
MongoDB shell version: 1.4.4
url: mytest
connecting to: mytest
type "help" for help
>
> db.members2.find()
{ "_id" : ObjectId("4c03c632b4742d2998000000"), "birthday" : 342057600, "type" : "human", "name" : "おがわ", "sex" : "M" }
{ "_id" : ObjectId("4c03c632b4742d2998000001"), "birthday" : 130550400, "type" : "human", "name" : "たかはし", "sex" : "F" }
{ "_id" : ObjectId("4c03c632b4742d2998000002"), "birthday" : 1042588800, "type" : "human", "name" : "たなか", "sex" : "M" }
{ "_id" : ObjectId("4c03c632b4742d2998000003"), "birthday" : -291600000, "type" : "human", "name" : "さとう", "sex" : "F" }
{ "_id" : ObjectId("4c03c632b4742d2998000004"), "birthday" : 1118102400, "type" : "dog", "name" : "ポチ", "sex" : "F" }
{ "_id" : ObjectId("4c03c632b4742d2998000005"), "birthday" : 807840000, "type" : "dog", "name" : "タロ", "sex" : "M" }
{ "_id" : ObjectId("4c03c632b4742d2998000006"), "birthday" : 1230076800, "type" : "cat", "name" : "タマ", "sex" : "F" }
{ "_id" : ObjectId("4c03c633b4742d2998000007"), "birthday" : 914544000, "type" : "cat", "name" : "ミケ", "sex" : "M" }
{ "_id" : ObjectId("4c03c633b4742d2998000008"), "birthday" : 0, "type" : "human", "name" : "John", "sex" : "M" }
{ "_id" : ObjectId("4c03c633b4742d2998000009"), "birthday" : -927676800, "type" : "human", "name" : "Michael", "sex" : "M" }
{ "_id" : ObjectId("4c03c633b4742d299800000a"), "birthday" : 927158400, "type" : "human", "name" : "Robert", "sex" : "M" }
{ "_id" : ObjectId("4c03c633b4742d299800000b"), "birthday" : 1259971200, "type" : "human", "name" : "David", "sex" : "M" }
{ "_id" : ObjectId("4c03c633b4742d299800000c"), "birthday" : -86400, "type" : "human", "name" : "James", "sex" : "M" }
{ "_id" : ObjectId("4c03c633b4742d299800000d"), "birthday" : 481939200, "type" : "human", "name" : "Mary", "sex" : "F" }
{ "_id" : ObjectId("4c03c633b4742d299800000e"), "birthday" : -448588800, "type" : "human", "name" : "Barbara", "sex" : "F" }
{ "_id" : ObjectId("4c03c633b4742d299800000f"), "birthday" : 979948800, "type" : "human", "name" : "Anne", "sex" : "F" }
{ "_id" : ObjectId("4c03c633b4742d2998000010"), "birthday" : -765849600, "type" : "human", "name" : "Maria", "sex" : "F" }
{ "_id" : ObjectId("4c03c633b4742d2998000011"), "birthday" : 249696000, "type" : "human", "name" : "Susan", "sex" : "F" }
>


まずは、mongodbの対話環境でやってみる。

$ mongo mytest
MongoDB shell version: 1.4.4
url: mytest
connecting to: mytest
type "help" for help
>
> var m = function(){ emit(this.type, 1); };
> var r = function(k, vals){ var sum=0; for(var i in vals) sum += vals[i]; return sum; };
> res = db.members2.mapReduce(m, r);
{
        "result" : "tmp.mr.mapreduce_1278193998_9",
        "timeMillis" : 39,
        "counts" : {
                "input" : 18,
                "emit" : 18,
                "output" : 3
        },
        "ok" : 1,
}
>
> db[res.result].find();
{ "_id" : "cat", "value" : 2 }
{ "_id" : "dog", "value" : 2 }
{ "_id" : "human", "value" : 14 }
>


今度は、Pythonで集計してみる。
test_mapReduce.py

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import pymongo

conn = pymongo.Connection()
db = conn["mytest"]
coll = db["members2"]

# type別に集計するためのmap,reduceを定義。(javascriptコードを文字列で渡す)
m = pymongo.code.Code("function(){ emit(this.type, 1); }")
r = pymongo.code.Code("""
    function(k, vals){
        var sum = 0;
        for(var i in vals) sum += vals[i];
        return sum;
    }
""")

result = coll.map_reduce(m, r)
for doc in result.find():
    print doc

conn.disconnect()


以下、実行結果。

$ python test_mapReduce.py
{u'_id': u'cat', u'value': 2.0}
{u'_id': u'dog', u'value': 2.0}
{u'_id': u'human', u'value': 14.0}


参考:
http://www.mongodb.org/display/DOCSJP/MapReduce
http://api.mongodb.org/python/1.7%2B/examples/map_reduce.html