Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine high-level overview in README.md #18

Merged
merged 2 commits into from
Aug 21, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 15 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,28 @@
# osm-adiff-service

Read the minutely replication files published by OpenStreetMap planet, and query changesets on Overpass to create full representations of changesets. It also posts the tag changes summary to the OSMCha API.
This service reads the minutely replication files published by OpenStreetMap, and builds JSON documents which describe each changeset in detail (including information which is not included in the replication file). It publishes these JSON files to S3, and also POSTs a summary of tag changes to the [OSMCha API](https://osmcha.org/api-docs/).

# Real OpenStreetMap Changesets

When a changeset is pushed to OSM, this stack builds a representation of the exact change that happened:
Each changeset JSON contains complete information about the changeset:

* Changeset metadata - username, id, timestamp, comment etc.
* Elements - each feature that was added, modified, or deleted in the changeset.
* For each element, the current and previous version including geometry and metadata.

#### Details
## What is this, and why?

[OSMCha](https://osmcha.org)'s purpose is to let users view a changeset in its entirety, including metadata about the changeset and the "before" and "after" versions of every OSM element that was changed.

The OSM API publishes minutely [replication files](https://wiki.openstreetmap.org/wiki/Planet.osm/diffs) in [`.osc` format](https://wiki.openstreetmap.org/wiki/OsmChange) that contain some information about each edit that is made to OSM, but these files are optimized for small size and don't contain all of the details required by OSMCha. Specifically:

- they do not include old ("before") versions of elements that changed
- they don't include way geometries at all unless the geometry itself was edited (not just the tags)
- they don't include bounding boxes

A richer diff format called [augmented diff](https://wiki.openstreetmap.org/wiki/Overpass_API/Augmented_Diffs) addresses these limitations. [Overpass](https://wiki.openstreetmap.org/wiki/Overpass_API) is capable of producing this type of diff. The `osm-adiff-service` can be used to process a replication file from the OSM API, retrieve additional data about each change by getting an augmented diff from Overpass, and republish the resulting info as JSON.

* New changesets are pushed to `https://s3.amazonaws.com/mapbox/real-changesets/production/<changeset-id>.json`
* Augmented Diffs are pushed by Ovepass are pushed to `https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/<state-id.osc>`.
* The latest state id is published here `https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/latest`
These JSON artifacts are named as [real-changesets](https://github.com/osmus/osmcha-charter-project/blob/main/real-changesets-docs.md), and currently the OSMCha's data pipeline is publishing the files in an [AWS Open Data S3 Bucket](https://registry.opendata.aws/real-changesets/). The `real-changesets` are used by OSMCha to provide the visualization of changesets to users. The component used to render it on the browser is the [changeset-map](https://github.com/osmlab/changeset-map).

#### Example
#### Example JSON changeset output

```json
// 20170309131154
Expand Down Expand Up @@ -102,14 +108,6 @@ When a changeset is pushed to OSM, this stack builds a representation of the exa
}
```

## What is this, and why?

A lot of processes around inspecting and searching for potentially bad edits on OpenStreetMap depend on being able to view a "changeset" in its entirety. This helps in gauging the context of an edit, see similar edits by the same user, and see edits in their "finished" state (i.e. not in between a changeset).

Our primary tool for visualizing changesets has been [changeset-map](http://osmlab.github.io/changeset-map/). We depend on [augmented diffs](http://wiki.openstreetmap.org/wiki/Overpass_API/Augmented_Diffs) generated by Overpass to generate these changeset representations and visualizations.

Augmented Diffs contains complete representations of changes in OSM for every minute. One can also query for a custom time range, and filter by bounding box or other attributes. These queries can be extremely slow, especially for large changesets, and were a major bottleneck in scaling up changeset reviewing processes.

### How to run

#### JS library
Expand Down
Loading