This repository is used to build a Docker image that is used to:
- Run all NYC Space/Time Directory ETL modules;
- Publish the results of those ETL modules to S3;
- Index the results of the
spacetime-graph
module into Elasticsearch to power the NYC Space/Time Directory API and Atlas.
For more information about the NYC Space/Time Directory project, see spacetime.nypl.org.
Environment variables:
DIGITAL_COLLECTIONS_TOKEN
SPACETIME_AWS_ACCESS_KEY_ID
SPACETIME_AWS_SECRET_ACCESS_KEY
See dist/datasets.json
.
Every dataset corresponds to a NYC Space/Time Directory ETL module, e.g. nyc-streets
corresponds to etl-nyc-streets
.
To build the Docker image, run:
./build.sh
For building without cache, run:
./build.sh --no-cache
To run the latest image:
./run-bash
To run the image and execute all ETL steps, run:
./run-etl
First, build the image, then get authorization key:
aws ecr get-login --region us-east-1 --profile spacetime --no-include-email
On MacOS, you can also copy the output from the command above directly to your clipboard:
aws ecr get-login --region us-east-1 --profile spacetime --no-include-email | pbcopy
Copy/paste/run the output of that command in bash to log in, and then push the image to ECR:
docker tag spacetime/etl:latest 843376026590.dkr.ecr.us-east-1.amazonaws.com/spacetime/etl:latest
docker push 843376026590.dkr.ecr.us-east-1.amazonaws.com/spacetime/etl:latest