Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestTooLargeError when using a lot of shards #73

Open
waleedka opened this issue Aug 30, 2015 · 3 comments
Open

RequestTooLargeError when using a lot of shards #73

waleedka opened this issue Aug 30, 2015 · 3 comments

Comments

@waleedka
Copy link

I created a mapreduce job with 2048 shards (I needed it for a very large update job). I didn't get any warning or error that the number of shards is too high. The code tried to create the mapper but it failed with the error below.

After this error, the mapreduce is stuck in an error state: it's listed in the /mapreduce/status page as "running", but I can't "Abort" it or clean it up.

E 2015-08-27 23:35:40.070  500      4 KB  1.06 s I 23:35:39.012 E 23:35:40.067 /mapreduce/kickoffjob_callback/1573912547002E1E3DD63
  0.1.0.2 - - [27/Aug/2015:23:35:40 -0700] "POST /mapreduce/kickoffjob_callback/1573912547002E1E3DD63 HTTP/1.1" 500 4094 "http://live.symphonytools.appspot.com/mapreduce/pipeline/run" "AppEngine-Google; (+http://code.google.com/appengine)" "live.symphonytools.appspot.com" ms=1062 cpu_ms=1063 cpm_usd=0.000458 queue_name=default task_name=59300224872921797641 instance=00c61b117cc0391b13d22845bf6ae422d8f6c9ca app_engine_release=1.9.25
    I 23:35:39.012 Processing kickoff for job 1573912547002E1E3DD63
    E 23:35:40.067 The request to API call datastore_v3.Put() was too large.
      Traceback (most recent call last):
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
          rv = self.handle_exception(request, response, e)
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
          rv = self.router.dispatch(request, response)
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
          return route.handler_adapter(request, response)
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
          return handler.dispatch()
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
          return self.handle_exception(e, self.app.debug)
        File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
          return method(*args, **kwargs)
        File "/base/data/home/apps/s~symphonytools/live.386746686635332317/mapreduce/base_handler.py", line 135, in post
          self.handle()
        File "/base/data/home/apps/s~symphonytools/live.386746686635332317/mapreduce/handlers.py", line 1385, in handle
          result = self._save_states(state, serialized_readers_entity)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2732, in inner_wrapper
          return RunInTransactionOptions(options, func, *args, **kwds)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2630, in RunInTransactionOptions
          ok, result = _DoOneTry(function, args, kwargs)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2650, in _DoOneTry
          result = function(*args, **kwargs)
        File "/base/data/home/apps/s~symphonytools/live.386746686635332317/mapreduce/handlers.py", line 1493, in _save_states
          db.put([state, serialized_readers_entity], config=config)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 1576, in put
          return put_async(models, **kwargs).get_result()
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 929, in get_result
          result = rpc.get_result()
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
          return self.__get_result_hook(self)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1881, in __put_hook
          self.check_rpc_success(rpc)
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1371, in check_rpc_success
          rpc.check_success()
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
          self.__rpc.CheckSuccess()
        File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 134, in CheckSuccess
          raise self.exception
      RequestTooLargeError: The request to API call datastore_v3.Put() was too large.
@aozarov
Copy link
Contributor

aozarov commented Sep 15, 2015

Short of clearing the task queue that was used for this MR job (or deleting the specific tasks) I am not sure I can give you better advice.

@tkaitchuck
Copy link
Contributor

We could cap the number of shards to prevent this sort of error. I have run 1024 successfully. In truth though, adding more shards once there are already that many cease to provide a performance boost due to the added overhead of managing them.

@soundofjw
Copy link
Contributor

@tkaitchuck That statement seems entirely dependent on the amount of work being performed for each datum.

I can confirm large 1,000+ shard jobs running, but @tkaitchuck is very right, that the added overhead will slow you down for most jobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants