Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shapefile writer performance improvements? #154

Closed
xivk opened this issue Feb 28, 2017 · 5 comments
Closed

Shapefile writer performance improvements? #154

xivk opened this issue Feb 28, 2017 · 5 comments
Labels

Comments

@xivk
Copy link
Contributor

xivk commented Feb 28, 2017

I'm working on a tool for a client of mine converting OSM data into a shapefile that can easily consumed for routing and analytics.

Basically I have 2 steps:

  1. Building a routerdb with Itinero.
  2. Writing the result as a shapefile.

Now Itinero can handle the entire world, the planet OSM file, and build a database for the entire world. I don't think it's even possible to attempt writing this to a shapefile but I am having performance issues, for example for a country like germany it takes about 10 hours (!) to write a shapefile with all german roads. Doing this in Itinero-only takes about 30 mins.

I have checked and the bottleneck clearly is the shapefile writer.

So I'm wondering, is it possible to improve the performance of writing a shapefile? Can anyone give me some pointers on where to get started? For example, why does it iterate twice over the source? Any way to compress the result? Germany now give me a shapefile of 70GB (!).

I'm more than willing to contribute the improvements to NTS as usual, but any help getting started would be appreciated.

@xivk xivk added the question label Feb 28, 2017
@DGuidi
Copy link
Contributor

DGuidi commented Feb 28, 2017

Any way to compress the result? Germany now give me a shapefile of 70GB (!).

My2Cents: shapefile is a binary format with a well defined standard, so size of file is directly related to size of data. maybe you can (I think):

  1. simplify your geometries
  2. create your own "ShapefileZipWriter" that directly generates a zipped archive of a shapefile (if possible)

@airbreather
Copy link
Member

Oh... does BigEndianBinaryWriter need the same performance improvement I did in c6d2ccd? I see some calls to that class's naive WriteIntBE method. Maybe the corresponding reader too...

@airbreather
Copy link
Member

If that doesn't help, could you please provide sample code and maybe a sample serialized RouterDb file so we can look at the same thing? I've got an old serialized routerdb file, but it dates back to the times when Itinero was part of OsmSharp, and I'm guessing you've changed stuff since then, and I really don't want to dedicate CPU time to rerun odp assuming it's still as slow as it was back then.

If I can just get that little bit of help, I'd love to spend time agonizing over this one.

@xivk
Copy link
Contributor Author

xivk commented Feb 28, 2017

I'll try and build a sample application and do some profiling too, stay tuned. :-)

@airbreather
Copy link
Member

This issue was moved to NetTopologySuite/NetTopologySuite.IO.ShapeFile#2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants