This extension helps to generate table schema for a resource based on its content. Support tabular data. You must define a field for datasets and resources to store a table schema data. Check config options below.
This extension has been written to work with python 2 and CKAN 2.9.5. Relies on datastore.
Compatibility with core CKAN versions:
CKAN version | Compatible? |
---|---|
2.6 and earlier | not tested |
2.7 | not tested |
2.8 | not tested |
2.9.5+ | yes |
Suggested values:
- "yes"
- "not tested" - I can't think of a reason why it wouldn't work
- "not yet" - there is an intention to get it working
- "no"
TODO: Add any additional install steps to the list below. For example installing any non-Python dependencies or adding any required config settings.
To install ckanext-validation-schema-generator:
-
Activate your CKAN virtual environment, for example:
. /usr/lib/ckan/default/bin/activate
-
Clone the source and install it on the virtualenv
git clone https://github.com//ckanext-validation-schema-generator.git cd ckanext-validation-schema-generator pip install -e . pip install -r requirements.txt
-
Add
validation-schema-generator
to theckan.plugins
setting in your CKAN config file (by default the config file is located at/etc/ckan/default/ckan.ini
). -
Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:
sudo service apache2 reload
# The maximum time for the schema generation before it is aborted.
# Give an amount in seconds. Default is 60 minutes
# (optional, default: 3600).
ckanext.validation_schema_generator.job_timeout = 3600
# If the resource is remote or private, we could pass an API key inside headers
# This option defines should we pass API key or not
# (optional, default: True).
ckanext.validation_schema_generator.pass_api_key = True
# API key that is going to be passed for `Authorization`
ckanext.validation_schema_generator.api_key =
# Field name for dataset schema field
# (optional, default: schema)
ckanext.validation_schema_generator.resource_schema_field_name = schema
# Field name for dataset schema field
# (optional, default: schema)
ckanext.validation_schema_generator.package_schema_field_name = default_data_schema
# Allow edit generated schema before apply
# (optional, defaukt: False)
ckanext.validation_schema_generator.allow_edit_generated_schema = False
To install ckanext-validation-schema-generator for development, activate your CKAN virtualenv and do:
git clone https://github.com//ckanext-validation-schema-generator.git
cd ckanext-validation-schema-generator
python setup.py develop
pip install -r dev-requirements.txt
There are few tests for the extension, so you could run it with next command. Be sure, that you've installed the dev-requirements from CKAN.
pytest --ckan-ini=test.ini
The extension has next endpoints to manipulate the schema generation process.
-
vsg_generate
- starts the schema generation process by creating the appropriate task, queues a background job to be executed byckan jobs worker
. Params:id
(required) - ID of the resource. Resource must be stored inside the datastore.
Returns
{ "help": ".../api/3/action/help_show?name=vsg_generate", "success": true, "result": { "entity_id": "<RESOURCE_ID>", "task_type": "generate", "last_updated": "2022-09-02 14:21:14.543511", "entity_type": "resource", "value": { "job_id": "<JOB_ID>" }, "state": "Pending", "key": "validation_schema_generator", "error": "{}", "id": "<TASK_ID>" } }
-
vsg_status
- returns a status of schema generation for a specific resource. Params:id
(required) - ID of the resource
Returns:
{ "help": ".../api/3/action/help_show?name=vsg_status", "success": true, "result": { "entity_id": "<RESOURCE_ID>", "task_type": "generate", "last_updated": "2022-09-02T14:21:18.289917", "entity_type": "resource", "value": { "job_id": "<JOB_ID>", "schema": { "fields": [ { "type": "string", "name": "Name", "format": "default" } ... ] }, "state": "Finished", "key": "validation_schema_generator", "error": {}, "id": "<TASK_ID>" } }
-
vsg_update
- updates a schema generation task data. The background job uses this action to update task after the schema is generated. Could be used for a testing purposes. Params:id
(required) - ID of the resource. The generation process must be in progress, otherwise returns a validation error.error
(required) - A dict of errors, e.g.{'format': 'couldn't generate a schema for XXX format'}
.status
(required)- status of a task, must be one of["Pending", "Finished", "Failed"]
schema
(optional) - a table schema, read more about it
Returns Updated task data, same as
vsg_status
-
vsg_apply
- Apply a generated schema for a resource or dataset. The schema can be applied only if the generation process is successfully completed. Params:id
(required) - ID of the resourceapply_for
(required)- apply for entity, must be one of["dataset", "resource"]