Export to GitHub

route-altitude-profile - issue #10

More App Engine storage


Posted on Jul 28, 2008 by Helpful Lion

At the moment, I can only squeeze 3 tiles into the data store. The world is a bit bigger than that.

Task, either: * convince App Engine team to give you more space * pay for more space once it is available

I will post my correspondence in the comments.

Comment #1

Posted on Jul 28, 2008 by Helpful Lion

From: sjors@sprovoost.nl Subject: AppEngine quota request Date: Tue, 15 Jul 2008

I have a very large data set: the NASA SRTM global elevation data. I estimate it will take about 2.1 TB in the data store. It consists of 21 billion records consisting of a key (representing the coordinate) and an integer (representing the altitude).

With the current bulk uploader it would take more than a decade to upload, but any extra GB you can spare would help so I can test on larger areas.

I would also like to know if my current usage (500 MB) includes the keys of my entities, since you plan on not charging for these in the future.

More info: Explains my application and some of the issues I have ran into: http://sprovoost.nl/2008/07/13/restful/

Ticket to request a better uploading tool: http://code.google.com/p/googleappengine/issues/detail?id=549

Discussion about storage space per entity: http://groups.google.com/group/google-appengine/browse_thread/thread/5256b6bb481d029b/edac915b899b3ab6?lnk=gst&q=cost+of+storage#edac915b899b3ab6 (I am considering more efficient ways to store the data, but they all add complexity and might cause other issues which I hope to avoid)

Comment #2

Posted on Jul 28, 2008 by Helpful Lion

Hi Sjors,

Thanks for sending in your request for higher Google App Engine resource quotas. Before we can evaluate your request, we need a little more info, and I apologize if this overlaps with the material on your site. Please respond to the following questions:

  1. Please describe the application you plan on building. Please convey the user experience, either via a set of interface mock-ups, or better yet, actual screen shots if you've already made progress.

  2. Please describe briefly the team you're working with. (If you're working alone, don't worry--this is ok!)

  3. Please describe as accurately as you can the amount of traffic you expect to receive. You should include the following metrics:

    • Expected maximum hits per day
    • Expected daily bandwidth requirements
    • Expected datastore storage requirements
    • Any other important notes about your resource needs

In the meantime I'm investigating whether keys are indeed currently counted against quota, and I'll update you as soon as I have an absolute answer.

Thanks very much!

Comment #3

Posted on Jul 28, 2008 by Helpful Lion

Thanks for getting back to me. It is no problem for me to provide you with some more information; that frees up more of your time to release the next SDK.

  1. Please describe the application you plan on building. Please convey the user experience, either via a set of interface mock-ups, or better yet, actual screen shots if you've already made progress.

I am building a service that receives a route as an input and then returns the altitude profile corresponding to that route. The altitude at each point of the route is calculated using the NASA SRTM data set. My summer of code mentor (Artem Dudarev) created a simple demonstration site around my service: http://dudarev.com/webmaps/profiledemo/ I have attached a screen shot.

The service is not designed to provide a 'user experience'; although I will support a simple GET interface, the service should be used by other websites, not directly by the end user.

To that end I will support 4 types of input and three types of output, in a RESTful way:

Input: * XML through HTTP/POST (OpenLS route standard, expensive, cumbersome, but standard... ) * Protocol buffers through HTTP/POST (easy, fast, this will be the preferred form of communication once the App Engine supports it) * HTTP/GET : easier to explain; just there to ease the learning curve for other developers

Output: * XML (same reason as above) * Protocol buffers: this means a website can render the profile any way they like * Google Chart image: easy to use, probably good enough for most purposes * URL pointing to Google Chart image: less work for my service than serving fetching the image and sending it

  1. Please describe briefly the team you're working with. (If you're working alone, don't worry--this is ok!)

I am a student in the Summer of Code program, so I am working alone, but with the support of my mentor (Artem Dudarev) and the OpenStreetMap community. I am also talking to the people behind the Melbourne Wireless project as the altitude profile service could be useful for them; it shows which other network nodes have a direct line of sight to each other (or at least which nodes definately do not).

I should probably also mention the code will be completely open source, of course.

  1. Please describe as accurately as you can the amount of traffic you expect to receive. You should include the following metrics:

    • Expected maximum hits per day
    • Expected daily bandwidth requirements I do not expect these things to break the free quota any time soon.

    • Expected datastore storage requirements This is the main issue. Lots of storage, not too much traffic (for now).

There are 21 bilion altitudes in the data set, covering most of the planet at a 100m resolution. I need to put these in the data store.

There are two ways I can do this: 1 : store every altitude in a seperate record 2 : store altitudes in groups of 100 (or so) per record, zipped

The only reason I consider the second option is because of the key overhead that comes with each record. I expect that no matter how I group the altitudes, I will end up throwing away 99 of these 100 points with each fetch. See also: http://groups.google.com/group/google-appengine/msg/927d647ef8a47eb0

Option 1: 21 bilion altitudes, 1 per record. 8 bytes per altitude because it is stored as long int. About 82 bytes overhead for the key, based on current experience. So that makes: 8 * 21 * 10^9 / 2^30 = 156 GB without keys 90 * 21 * 10^9 / 2^30 = 1760 GB (1.7 TB) with keys

Option 2: 21 billion altitudes, 100 per record, zipped. Based on the size of the original nasa zip files and assuming that zipping smaller blocks is a bit less efficient: Between 13 and 21 GB without keys Add another 82 * 21 * 10^9 / 2^30 / 100 = 16 GB for the keys.

Any other important notes about your resource needs

Uploading the data is very CPU intensive. I've opened a ticket about this issue:

http://code.google.com/p/googleappengine/issues/detail?id=549

I have only tried option 1 so far. I can not upload more than about 100 - 200 altitudes per POST request. I can not send more than about 5000 of these requests a day. At my current CPU quota it will take at least two decades to upload the planet this way, perhaps less if I use option two.

I see two possible solutions here: Solution 1: wait for the next generation upload tool (which I would be happy to test for you) Solution 2: if you temporarely increase my CPU quota by at least a factor 100, I should be able to upload a lot faster; i.e. use lots of concurrent upload threads. That might also require you to calm down certain security systems for my IP addresses, as I can image it would look like an attack.

In the meantime I'm investigating whether keys are indeed currently counted against quota, and I'll update you as soon as I have an absolute answer.

Thanks. In the above I assumed that currently the keys are counted against the quota.

Hopefully I have provided you with enough information, yet not too much. Please let me know if you need more information, or if you want me to come up with a completely different approach.

Attachments

Comment #4

Posted on Jul 28, 2008 by Helpful Lion

Hi Sjors,

Thanks for the additional details! I've passed them along to the product management engineering teams, and you'll be contacted as they provide feedback.

Let me know if you have any other questions, or if there's anything else I can help with in the meantime.

Comment #5

Posted on Aug 12, 2008 by Helpful Lion

Hey Sjors,

I saw your quota request for App Engine and wanted to follow up. At this point 1TB is too large for us during our preview period. In the not too distant future you'll be able to upload this data but for now we'll have to ask you to store it someplace else.

Sorry we can't accommodate you now, check back with us soon.

Comment #6

Posted on Aug 12, 2008 by Helpful Lion

[my response, 12-08-2008]

Thanks for looking into it.

I already suspected that 1 TB would be a bit overkill, but that is why I also suggested a different storage method that would only require about 21 GB (plus 16 for the keys).

Also, even with about 5 to 10 gigabyte I could make the application significantly more useful than it is now; i.e. it could be used in production mode.

Would that request be easier to accommodate?

Comment #7

Posted on Aug 13, 2008 by Helpful Lion

Looks like I will get 2 GB soon and they are working on ways to increase that even further in the near future. So that is good news!

I will leave the ticket open.

Status: Accepted

Labels:
Type-Task Priority-Medium Usability