This document details the procedure to run the CKAN Next Generation harvesters.
This includes:
- a CKAN instance to harvest to.
- Airflow service to schedule and run harvest jobs periodically.
The docker image for harvester+airflow is defined in this GitHub repo.
In this repo folder run and follow all steps:
./create_secrets.py
Start the Docker compose environment with all its components.
docker-compose \
-f docker-compose.yaml \
-f .docker-compose-db.yaml \
-f .docker-compose.datagov-theme.yaml \
-f .docker-compose-harvester_ng.yaml \
up -d --build nginx harvester
Add a hosts entry mapping domain nginx
to 127.0.0.1
:
127.0.0.1 nginx
127.0.0.1 ckan
Create a CKAN admin user
docker-compose \
exec ckan ckan-paster \
--plugin=ckan \
sysadmin add \
-c /etc/ckan/production.ini \
admin password=12345678 \
email=admin@localhost
Now you are able to togin to CKAN at http://nginx:8080 with username admin
and password 12345678
Your harvest source list will be empty
After starts, Airflow will read all the harvest sources (at the moment just data.json and CSW sources) at the CKAN instance.
The first time this will be empty, you don't have any harvest source defined in this clean CKAN instance.
You could check the Airflow status at http://nginx:8080/airflow/.
In order to fill the CKAN instance with harvest sources you could add it manually at http://nginx:8080/harvest/new.
Or you can import all the harvest sources from another CKAN instance with the Harvester NG.
You will need to clone this repo, install the requirements and run the import script.
In order to define the destination CKAN instance (at http://nginx:8080) you will need to copy the settings.py file as local_settings.py file and define the API key and other required values.
# Data.json type
python3 import_harvest_sources.py --import_from_url https://catalog.data.gov --source_type datajson --method GET
# CSW type
python3 import_harvest_sources.py --import_from_url https://catalog.data.gov --source_type csw --method GET
While the harvest sources filled Airflow will read them and create Dags
Clean all the data at the images
docker-compose \
-f docker-compose.yaml \
-f .docker-compose-db.yaml \
-f .docker-compose.datagov-theme.yaml \
-f .docker-compose-harvester_ng.yaml \
down -v
Check logs
$ docker-compose \
-f docker-compose.yaml \
-f .docker-compose-db.yaml \
-f .docker-compose.datagov-theme.yaml \
logs -f