-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #11 from gzt5142/gt-010-fetch-data
Fetch data source
- Loading branch information
Showing
15 changed files
with
4,266 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ | |
.nox | ||
.pytest_cache | ||
.ipynb_checkpoints | ||
*.geojson | ||
__pycache__ | ||
poetry.lock | ||
docs/_build |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
""" | ||
Configuration for SPHINX document generator | ||
""" | ||
project = "NLDI Crawler" | ||
author = "USGS" | ||
copyright = f"2022, {author}" | ||
extensions= [ | ||
"sphinx.ext.autodoc", | ||
"sphinx.ext.napoleon", | ||
"sphinx_autodoc_typehints", | ||
"myst_parser", | ||
"sphinx_rtd_theme", | ||
'sphinxcontrib.mermaid' | ||
] | ||
html_theme = "sphinx_rtd_theme" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
SS-Delineate Documentation | ||
========================== | ||
|
||
.. toctree:: | ||
:hidden: | ||
|
||
source_table | ||
workflow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Sources Table | ||
|
||
Annotations for the `crawler_source` table, which holds information for finding and processing feature sources: | ||
|
||
| Type | Column Name | Description | | ||
|------|-------------| ------------| | ||
| integer | crawler_source_id | The unique identifier to differentiate sources in this table. | | ||
| string | source_name | A human-readable, friendly discriptor for the data source | | ||
| string | source_suffix | This string is used to build table names internally. It should be a unique string with no special characters | | ||
| string | source_uri | The web address from which feature data is retrieved. | | ||
| string | feature_id | The returned GeoJSON from `source_uri` includes feature properties/attributes. This field identifies the name of the property which uniquely identifies the feature within the feature collection. This is treated as the `KEY` within the feature collection | | ||
| string | feature_name | The property name within the returned GeoJSON which holds the name of the feature. | | ||
| string | feature_uri | the property name within the returned GeoJSON which holds the URL by which a feature can be accessed directly. | | ||
| string | feature_reach | The property name within the returned GeoJSON which holds the reach identifier | | ||
| string | feature_measure | The property name within the returned GeoJSON which holds the M-value along the `feature_reach` where this feature can be found | | ||
| string | ingest_type | The type of feature to be parsed. This string should be one of [ `reach` , `point` ] | | ||
| string | feature_type | Unknown. This string is one of [ `hydrolocation` , `point` , `varies` ] | ||
|
||
|
||
## Example | ||
|
||
```sql | ||
SELECT * from nldi_data.crawler_source where crawler_source_id = 10 | ||
``` | ||
|Source number `10` contains the following data: | ||
|
||
|Column | Value | | ||
|-------|-------| | ||
|crawler_source_id | 10 | ||
|source_name | Vigil Network Data | ||
|source_suffix | vigil | ||
|source_uri | https://www.sciencebase.gov/catalog/file/get/60c7b895d34e86b9389b2a6c?name=vigil.geojson | ||
|feature_id | SBID | ||
|feature_name | Site Name | ||
|feature_uri | SBURL | ||
|feature_reach | REACHCODE | ||
|feature_measure | REACH_measure | ||
|ingest_type | reach | ||
|feature_type | hydrolocation | ||
|
||
If we fetch the GeoJSON for this source, we see that the feature table looks like this: | ||
|
||
| SBID | Site Name | SBURL | REACHCODE | REACH_measure | Location | geometry | ... | | ||
|------|-----------|-------|-----------|---------------|----------|----------| ----| | ||
|5fe395bbd34ea5387deb4950 | Aching Shoulder Slope, New Mexico, USA | https://www.sciencebase.gov/catalog/item/5fe395bbd34ea5387deb4950 | null | null | Mitten Rock, New Mexico USA | Point() | ... | | ||
5fe39807d34ea5387deb4970 | Armells Creek, Montana, USA | https://www.sciencebase.gov/catalog/item/5fe39807d34ea5387deb4970 | 10100001000709 | 90.193048735368549 | Yellowstone River Basin, Southeastern Montana, USA | Point() | ... | | ||
|...| | ||
|...| | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# Workflow | ||
|
||
The crawler CLI will bulk-download feature data from pre-defined sources. The sequence is a follows: | ||
|
||
## Sequence Diagram | ||
|
||
```mermaid | ||
%%{init: { | ||
"theme": "default", | ||
"mid-width": 2500, | ||
"max-width": 5000, | ||
"sequence": {"showSequenceNumbers": true } | ||
} | ||
}%% | ||
sequenceDiagram | ||
actor CLI | ||
CLI->>Crawler: launch | ||
Crawler->>+NLDI-DB: Get Source Information | ||
Note left of NLDI-DB: SELECT * FROM nldi_data.crawler_source | ||
NLDI-DB-->>-Crawler: Sources table | ||
Crawler->>+FeatureSource: Request Features | ||
Note left of FeatureSource: HTTP GET ${crawler_source.source_uri} | ||
FeatureSource-->>-Crawler: GeoJSON FeatureCollection | ||
loop foreach feature in Collection | ||
Crawler-->>+Crawler: ORM | ||
Note right of Crawler: Parses and maps feature to SQL | ||
Crawler->>-NLDI-DB: Add to feature table | ||
Note left of NLDI-DB: INSERT INTO nldi_data.features | ||
end | ||
Crawler->>NLDI-DB: Relate Features | ||
%NLDI-DB-->>-Crawler: Success | ||
``` | ||
|
||
## Annotations | ||
|
||
1) Launch CLI tool | ||
2) Connect to NLDI master database, requesting the list of configured feature sources. | ||
3) Returns a list of feature sources. The crawler can either: | ||
* list all sources and exit | ||
* Proceed to 'crawl' one of the sources in the table | ||
4) For the identified feature source, make a GET request via HTTP. The URI is taken from the `crawler_sources` table. | ||
5) The feature source returns GeoJSON. Among the returned data is a list of 'features'. | ||
6) **[Per-Feature]** Use the ORM to map the feature data to the schema reflected from the `features` table | ||
7) **[Per-Feature]** Insert the new feature to the master NLDI database | ||
8) "Relate" features -- build the relationships matching features to their adjacent features in the NLDI topology. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.