Skip to content

Latest commit

 

History

History
198 lines (165 loc) · 9.54 KB

README.md

File metadata and controls

198 lines (165 loc) · 9.54 KB

Code style: black

odd-collector

ODD Collector is a lightweight service that gathers metadata from all your data sources.

To learn more about collector types and ODD Platform's architecture, read the documentation.

Preview:

Implemented adapters

Service Config example
Cassandra config
ClickHouse config
Dbt config
Elasticsearch config
Feast config
Hive config
Kafka config
Kubeflow config
MariaDB config, supported via MySql adapter
MongoDB config
MSSql config
MySql config
Neo4j config
PostgreSQL config
Presto config
Redash config
Redshift config
Snowflake config
Superset config
Tableau config
Tarantool config
Trino config
Vertica config
ODBC config, README.md
Cube config
ODD Adapter config
Apache Druid config
Oracle config
Airbyte config
SingleStore config
cockroachdb config
sqlite config

Class diagram of adapter class hierarchy

This may help you to understand which fields you need for each adapter in collector_config.yaml and also may be helpful for a new adapter developer. Adapter domain class hierarchy

PlantUML code for above diagram: domain_classes.plantuml

To regenerate picture, you have 2 options:

  1. Having PlantUML installed locally, do
java -jar plantuml.jar domain_classes.plantuml
  1. Use PyCharm or other IDE's PlantUML plugin

Building

docker build .

M1 building issue

libraries pyodbc , confluent-kafka and grpcio have problem during installing and building project on Mac M1.

Possible solutions

# NOTE: be aware of versions
# NOTE: easiest way is to add all export statements to your .bashrc/.zshrc file

# pyodbc dependencies
brew install unixodbc freetds openssl

export LDFLAGS="-L/opt/homebrew/lib  -L/opt/homebrew/Cellar/unixodbc/2.3.11/include -L/opt/homebrew/Cellar/freetds/1.3.17/lib -L/opt/homebrew/Cellar/[email protected]/1.1.1t/lib"
export CFLAGS="-I/opt/homebrew/Cellar/unixodbc/2.3.11/include -I/opt/homebrew/opt/freetds/include"
export CPPFLAGS="-I/opt/homebrew/include -I/opt/homebrew/Cellar/unixodbc/2.3.11/include -I/opt/homebrew/opt/openssl@3/include"

# confluent-kafka
brew install librdkafka

export C_INCLUDE_PATH=/opt/homebrew/Cellar/librdkafka/1.9.0/include
export LIBRARY_PATH=/opt/homebrew/Cellar/librdkafka/1.9.0/lib
export PATH="/opt/homebrew/opt/openssl@3/bin:$PATH"

# grpcio
export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1
export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1

# mymssql
brew install freetds
export LDFLAGS="-L/opt/homebrew/Cellar/freetds/1.3.17/lib -L/opt/homebrew/Cellar/[email protected]/1.1.1t/lib"

Local development setup for odd-collector

When you want to start a collector you can just run start.sh script, but if you want to start or primarily debug the __main__.py file from IDE it may become confusing because of errors. This errors will appear because COLLECTOR_PACKAGE = __package__ constant variable will not get any package name as you run it like script. So you must configure the execution in IDE in the way configuration understands it should be invoked like a module.

For example in PyCharm you need to go to:

  • click 'Edit configurations' (script run configurations)
  • click 'Add new configuration'
  • change 'Script path' to 'Module name' and put value of 'odd_collector'
  • Also set 'Working diractory' to the level of 'odd-collectors/odd-collector' (but path should be absolute)

Now you should be able to invoke it just pressing the green triangle.

Also if you are working with odd_collector_sdk, modifying its logic and want to be able to debug it from collector you need to do some in-between steps:

  • you need to delete package odd_collector_sdk from venv (because you want this module to be imported not like package from pypi)
  • mark odd-collector-sdk diractory as 'Source root'
  • set new environment variable in 'Run conficuration': PYTHONPATH=${PYTHONPATH}:/absoulte/path/to/repo/odd-collectors/odd-collector-sdk

Now IDE should see odd_collector_sdk imports.

Docker compose example

Custom .env file for docker-compose.yaml

LOGLEVEL=DEBUG
PLATFORM_HOST_URL=http://odd-platform:8080
POSTGRES_PASSWORD=postgres_password_secret

There are 3 options for config field pass:

  1. Explicitly set it in collector_config.yaml file, i.e database: odd-platform-db
  2. Use .env file or ENV variables
  3. In situation when plugins have same field names, we can explicitly set ENV variable to collector_config.yaml, i.e. password: !ENV ${POSTGRES_PASSWORD}

Also there is an option to store configuration settings via Secrets Backend (Only AWS SSM Parameter Store is supported for now). In this case you need to create secrets in Parameter Store according to namings set up in secrets_backend section of config.

Custom collector-config.yaml

secrets_backend:
  provider: "AWSSystemsManagerParameterStore"
  # the section below is for key-value arguments provider needs
  region_name: "eu-central-1"
  collector_settings_parameter_name: "/odd/collector_config/collector_settings"
  collector_plugins_prefix: "/odd/collector_config/plugins"

platform_host_url: http://localhost:8080
default_pulling_interval: 10
token: ""
plugins:
  - type: postgresql
    name: test_postgresql_adapter
    host: "localhost"
    port: 5432
    database: "some_database_name"
    user: "some_user_name"
    password: !ENV ${POSTGRES_PASSWORD}
  - type: mysql
    name: test_mysql_adapter
    host: "localhost"
    port: 3306
    database: "some_database_name"
    user: "some_user_name"
    password: "some_password"

docker-compose.yaml

version: "3.8"
services:
  # --- ODD Platform ---
  database:
    ...
  odd-platform:
    ...

  odd-collector:
    image: ghcr.io/opendatadiscovery/odd-collector:latest
    restart: always
    volumes:
      - collector_config.yaml:/app/collector_config.yaml
    environment:
      - PLATFORM_HOST_URL=${PLATFORM_HOST_URL}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    depends_on:
      - odd-platform