My favorites
▼
|
Sign in
tropo
Misc fun from chencer.com, formerly tropo.com
Project Home
Downloads
Wiki
Issues
Source
Checkout
Browse
Changes
Source path:
svn
/
trunk
/
Python
/
tr_mapreduce
/
mr_simple.py
‹r36
r1074
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/usr/local/bin/python2.5
import itertools
def MrSimple(producer,
mapper,
reducer,
consumer):
"""Perform a simple, single-threaded, in-memory, map-reduce.
Parameters are:
producer: Function called with no args that generates series of name, value pairs to feed to the mapper.
mapper: Function passed name, value pairs which generates a sequence of probably different
name2,value2 pairs. Normally the names will be strings and the values can be any reasonable object.
reducer: Function passed name,(value1,value2,...) pairs which generates one value which will be
associated with the 'name'. This function will be called after the names are sorted.
consumer: Function passed name, value pairs - intent is this can output the data, persist it, etc.
There is no return value.
The expectation here is that the first 3 arguments can be generator functions.
Also, we don't tell 'consumer' when we're done - if you have an impl that has to know
when the map-reduce is finished then you implement this when MrSimple returns.
"""
stage1 = []
for n, v in producer():
for n2, v2 in mapper(n, v):
stage1.append((n2, v2))
for n2, vals in itertools.groupby(sorted(stage1), lambda x: x[0]):
# 'vals' is seq of (key,val) and all the keys are the same
# so we just get the second part
seconds = (second[1] for second in vals)
consumer(n2, reducer(n2, seconds))
Show details
Hide details
Change log
r42
by david.spencerian on Jul 11, 2008
Diff
simplify and cleanup
Go to:
...Python/tr_mapreduce/mr_simple.py
...n/tr_mapreduce/mr_simple_demo.py
Project members,
sign in
to write a code review
Older revisions
r36
by david.spencerian on Jul 9, 2008
Diff
touchup
r34
by david.spencerian on Jul 9, 2008
Diff
prelim simple mr
All revisions of this file
File info
Size: 1505 bytes, 35 lines
View raw file
Powered by
Google Project Hosting