You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I created a mapreduce job with 2048 shards (I needed it for a very large update job). I didn't get any warning or error that the number of shards is too high. The code tried to create the mapper but it failed with the error below.
After this error, the mapreduce is stuck in an error state: it's listed in the /mapreduce/status page as "running", but I can't "Abort" it or clean it up.
E 2015-08-27 23:35:40.070 500 4 KB 1.06 s I 23:35:39.012 E 23:35:40.067 /mapreduce/kickoffjob_callback/1573912547002E1E3DD63
0.1.0.2 - - [27/Aug/2015:23:35:40 -0700] "POST /mapreduce/kickoffjob_callback/1573912547002E1E3DD63 HTTP/1.1" 500 4094 "http://live.symphonytools.appspot.com/mapreduce/pipeline/run" "AppEngine-Google; (+http://code.google.com/appengine)" "live.symphonytools.appspot.com" ms=1062 cpu_ms=1063 cpm_usd=0.000458 queue_name=default task_name=59300224872921797641 instance=00c61b117cc0391b13d22845bf6ae422d8f6c9ca app_engine_release=1.9.25
I 23:35:39.012 Processing kickoff for job 1573912547002E1E3DD63
E 23:35:40.067 The request to API call datastore_v3.Put() was too large.
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~symphonytools/live.386746686635332317/mapreduce/base_handler.py", line 135, in post
self.handle()
File "/base/data/home/apps/s~symphonytools/live.386746686635332317/mapreduce/handlers.py", line 1385, in handle
result = self._save_states(state, serialized_readers_entity)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2732, in inner_wrapper
return RunInTransactionOptions(options, func, *args, **kwds)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2630, in RunInTransactionOptions
ok, result = _DoOneTry(function, args, kwargs)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2650, in _DoOneTry
result = function(*args, **kwargs)
File "/base/data/home/apps/s~symphonytools/live.386746686635332317/mapreduce/handlers.py", line 1493, in _save_states
db.put([state, serialized_readers_entity], config=config)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 1576, in put
return put_async(models, **kwargs).get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 929, in get_result
result = rpc.get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1881, in __put_hook
self.check_rpc_success(rpc)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1371, in check_rpc_success
rpc.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
self.__rpc.CheckSuccess()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 134, in CheckSuccess
raise self.exception
RequestTooLargeError: The request to API call datastore_v3.Put() was too large.
The text was updated successfully, but these errors were encountered:
We could cap the number of shards to prevent this sort of error. I have run 1024 successfully. In truth though, adding more shards once there are already that many cease to provide a performance boost due to the added overhead of managing them.
I created a mapreduce job with 2048 shards (I needed it for a very large update job). I didn't get any warning or error that the number of shards is too high. The code tried to create the mapper but it failed with the error below.
After this error, the mapreduce is stuck in an error state: it's listed in the /mapreduce/status page as "running", but I can't "Abort" it or clean it up.
The text was updated successfully, but these errors were encountered: