Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate usage of alternative ways to process PBFs #80

Open
nilsnolde opened this issue Feb 27, 2020 · 3 comments
Open

Investigate usage of alternative ways to process PBFs #80

nilsnolde opened this issue Feb 27, 2020 · 3 comments
Assignees
Labels
infrastructure waiting for release Fixed in another branch than `master`, waiting to be merged

Comments

@nilsnolde
Copy link
Contributor

Since imposm is deprecated since a long while and we really don't wanna start maintaining that, we need to investigate alternative ways to process PBFs:

  • pyosmium could be an OK candidate. Very limited API, but does accept callbacks for OSM types
  • https://pypi.org/project/esy-osm-pbf/ seems to be a relatively new package, doing what we'd need (couldn't find in on any VCS platform though..)
  • use osmium/osmosis or other command-line utilities or even Pelias' pbf2json utitlity. All at the expense of creating more non-Python dependencies..

So, these will have to be evaluated a little in terms of performance with clear favorites being the first two options, as only protobuf lib as non-Python dep needed.

@TimMcCauley
Copy link
Contributor

If you are going to change this one day, just be aware with this that you don't fall into the same trap I did back then with the amount of memory used to parse larger pbf files and holding data in memory for later stages, e.g. https://github.com/GIScience/openpoiservice/blob/master/openpoiservice/server/db_import/parse_osm.py#L270

@nilsnolde
Copy link
Contributor Author

Yep, I'll have a look how others do that, e.g. Pelias OSM importer, should be a fairly similar problem for them.

@nilsnolde
Copy link
Contributor Author

Some update on this:

I'll use pyosmium. It's way more sophisticated than I thought. With that, we can use good strategies handle the memory stuff, the strategy could be derived from size of PBF and available RAM:
https://osmcode.org/osmium-concepts/#list-of-map-index-classes

@nilsnolde nilsnolde added the waiting for release Fixed in another branch than `master`, waiting to be merged label Mar 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure waiting for release Fixed in another branch than `master`, waiting to be merged
Projects
None yet
Development

No branches or pull requests

2 participants