ODD Collector is a lightweight service that gathers metadata from all your data sources.
To learn more about collector types and ODD Platform's architecture, read the documentation.
Service | Config example |
---|---|
Cassandra | config |
ClickHouse | config |
Dbt | config |
Elasticsearch | config |
Feast | config |
Hive | config |
Kafka | config |
Kubeflow | config |
MariaDB | config, supported via MySql adapter |
MongoDB | config |
MSSql | config |
MySql | config |
Neo4j | config |
PostgreSQL | config |
Presto | config |
Redash | config |
Redshift | config |
Snowflake | config |
Superset | config |
Tableau | config |
Tarantool | config |
Trino | config |
Vertica | config |
ODBC | config, README.md |
Cube | config |
ODD Adapter | config |
Apache Druid | config |
Oracle | config |
Airbyte | config |
SingleStore | config |
cockroachdb | config |
sqlite | config |
This may help you to understand which fields you need for each adapter in collector_config.yaml
and also may be helpful for a new adapter developer.
PlantUML code for above diagram: domain_classes.plantuml
To regenerate picture, you have 2 options:
- Having PlantUML installed locally, do
java -jar plantuml.jar domain_classes.plantuml
- Use PyCharm or other IDE's PlantUML plugin
docker build .
libraries pyodbc
, confluent-kafka
and grpcio
have problem during installing and building project on Mac M1.
Possible solutions
# NOTE: be aware of versions
# NOTE: easiest way is to add all export statements to your .bashrc/.zshrc file
# pyodbc dependencies
brew install unixodbc freetds openssl
export LDFLAGS="-L/opt/homebrew/lib -L/opt/homebrew/Cellar/unixodbc/2.3.11/include -L/opt/homebrew/Cellar/freetds/1.3.17/lib -L/opt/homebrew/Cellar/[email protected]/1.1.1t/lib"
export CFLAGS="-I/opt/homebrew/Cellar/unixodbc/2.3.11/include -I/opt/homebrew/opt/freetds/include"
export CPPFLAGS="-I/opt/homebrew/include -I/opt/homebrew/Cellar/unixodbc/2.3.11/include -I/opt/homebrew/opt/openssl@3/include"
# confluent-kafka
brew install librdkafka
export C_INCLUDE_PATH=/opt/homebrew/Cellar/librdkafka/1.9.0/include
export LIBRARY_PATH=/opt/homebrew/Cellar/librdkafka/1.9.0/lib
export PATH="/opt/homebrew/opt/openssl@3/bin:$PATH"
# grpcio
export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1
export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1
# mymssql
brew install freetds
export LDFLAGS="-L/opt/homebrew/Cellar/freetds/1.3.17/lib -L/opt/homebrew/Cellar/[email protected]/1.1.1t/lib"
When you want to start a collector you can just run start.sh
script, but if you want to start
or primarily debug the __main__.py
file from IDE it may become confusing because of errors.
This errors will appear because COLLECTOR_PACKAGE = __package__
constant variable will not get
any package name as you run it like script. So you must configure the execution in IDE in the way
configuration understands it should be invoked like a module.
For example in PyCharm you need to go to:
- click 'Edit configurations' (script run configurations)
- click 'Add new configuration'
- change 'Script path' to 'Module name' and put value of 'odd_collector'
- Also set 'Working diractory' to the level of 'odd-collectors/odd-collector' (but path should be absolute)
Now you should be able to invoke it just pressing the green triangle.
Also if you are working with odd_collector_sdk
, modifying its logic and want to be able to debug it from
collector you need to do some in-between steps:
- you need to delete package
odd_collector_sdk
from venv (because you want this module to be imported not like package from pypi) - mark
odd-collector-sdk
diractory as 'Source root' - set new environment variable in 'Run conficuration':
PYTHONPATH=${PYTHONPATH}:/absoulte/path/to/repo/odd-collectors/odd-collector-sdk
Now IDE should see odd_collector_sdk
imports.
Custom .env
file for docker-compose.yaml
LOGLEVEL=DEBUG
PLATFORM_HOST_URL=http://odd-platform:8080
POSTGRES_PASSWORD=postgres_password_secret
There are 3 options for config field pass:
- Explicitly set it in
collector_config.yaml
file, i.edatabase: odd-platform-db
- Use
.env
file or ENV variables - In situation when plugins have same field names, we can explicitly set ENV variable to
collector_config.yaml
, i.e.password: !ENV ${POSTGRES_PASSWORD}
Also there is an option to store configuration settings via Secrets Backend (Only AWS SSM Parameter Store is supported for now). In this case you need to create secrets in Parameter Store according to namings set up in secrets_backend section of config.
Custom collector-config.yaml
secrets_backend:
provider: "AWSSystemsManagerParameterStore"
# the section below is for key-value arguments provider needs
region_name: "eu-central-1"
collector_settings_parameter_name: "/odd/collector_config/collector_settings"
collector_plugins_prefix: "/odd/collector_config/plugins"
platform_host_url: http://localhost:8080
default_pulling_interval: 10
token: ""
plugins:
- type: postgresql
name: test_postgresql_adapter
host: "localhost"
port: 5432
database: "some_database_name"
user: "some_user_name"
password: !ENV ${POSTGRES_PASSWORD}
- type: mysql
name: test_mysql_adapter
host: "localhost"
port: 3306
database: "some_database_name"
user: "some_user_name"
password: "some_password"
docker-compose.yaml
version: "3.8"
services:
# --- ODD Platform ---
database:
...
odd-platform:
...
odd-collector:
image: ghcr.io/opendatadiscovery/odd-collector:latest
restart: always
volumes:
- collector_config.yaml:/app/collector_config.yaml
environment:
- PLATFORM_HOST_URL=${PLATFORM_HOST_URL}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
depends_on:
- odd-platform