Merge pull request #2 from thehyve/first-version

First version of the CDM
thehyve · Apr 15, 2024 · d7cb7c6 · d7cb7c6
2 parents 1f88639 + bcdbb1b
commit d7cb7c6
Show file tree

Hide file tree

Showing 68 changed files with 11,499 additions and 0 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,4 @@
+<!---
+Thank you for your soon-to-be pull request. Before you submit this, please
+double check to make sure that you've added an entry to CHANGELOG.md.
+-->
diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
@@ -0,0 +1,15 @@
+name: Ruff
+on: [push, pull_request]
+jobs:
+  ruff:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v4
+      - name: Install Poetry
+        uses: snok/install-poetry@v1
+      - name: Install nox
+        run: pip install nox
+      - name: Run ruff linter via nox session
+        run: nox -s lint
diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml
@@ -0,0 +1,32 @@
+# This workflow will install Python dependencies, run tests with a range of Python versions
+
+name: tests
+
+on: [push, pull_request]
+
+jobs:
+  tests:
+    runs-on: ubuntu-latest
+
+    strategy:
+      matrix:
+        python-version: [
+          '3.8',
+          '3.9',
+          '3.10',
+          '3.11',
+          '3.12',
+        ]
+
+    steps:
+    - uses: actions/checkout@v4
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v4
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install Poetry
+      uses: snok/install-poetry@v1
+    - name: Install package and dependencies
+      run: poetry install
+    - name: Test with pytest
+      run: poetry run pytest -vs
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,19 @@
+# See https://pre-commit.com for more information
+# See https://pre-commit.com/hooks.html for more hooks
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.5.0
+    hooks:
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: check-yaml
+      - id: check-added-large-files
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    # Ruff version.
+    rev: v0.3.2
+    hooks:
+      # Run the linter.
+      - id: ruff
+        args: [ --fix ]
+      # Run the formatter.
+      - id: ruff-format
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,4 @@
+# Changelog
+
+## v0.1.0
+First release.
diff --git a/README.md b/README.md
@@ -0,0 +1,46 @@
+# omop-cdm
+
+omop-cdm is a Python package that contains SQLAlchemy declarative table definitions of several
+versions of the [OHDSI OMOP CDM](https://ohdsi.github.io/CommonDataModel/).
+
+## Installation
+
+omop-cdm requires Python >= 3.8.
+
+Install from PyPI:
+```shell
+pip install omop-cdm
+```
+
+## Usage
+
+See [User documentation](docs/README.md)
+
+## Supported databases
+The omop-cdm table definitions are tested to be compatible with PostgreSQL.
+
+Though not officially supported, omop-cdm doesn't use postgres-specific features
+of SQLAlchemy, so it can likely be used for other database types as well.
+
+## CDM versions
+omop-cdm contains table defintions for the following CDM versions:
+- CDM 5.4
+- CDM 5.3.1
+- CDM 6.0.0 ([not recommended](https://ohdsi.github.io/CommonDataModel/cdm60.html#NOTE_ABOUT_CDM_v60))
+
+## Development
+
+### Setup steps
+
+- Make sure [Poetry](https://python-poetry.org/docs/#installation) is installed.
+- Install the project and dependencies via `poetry install`.
+- Set up the pre-commit hook scripts via `poetry run pre-commit install`.
+
+### Nox sessions
+
+Several developer actions (e.g. run tests, code format, lint) are available
+via [nox](https://nox.thea.codes/en/stable/) sessions.
+For a complete list, run:
+```shell
+nox --list
+```
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,201 @@
+# Usage
+
+omop-cdm provides SQLAlchemy declarative table definitions that can be used
+to interact with an OMOP CDM via Python. This can be to more easily create all tables in
+your database, but also to use them for data manipulation
+(see [SQLAlchemy data manipulation](https://docs.sqlalchemy.org/en/20/tutorial/orm_data_manipulation.html)).
+
+## Preparation
+
+Before creating the OMOP CDM tables, the following must be available:
+
+- Target database
+- Schema (or schemas) in which to create the tables
+
+## Schemas
+
+omop-cdm uses placeholders for the schema names, which are intended to be
+replaced by the desired schema names. This can be done via the `schema_translate_map`
+property of SQLAlchemy. By default, the vocabulary tables have a different schema
+than the other tables, but both placeholders can be mapped to the same target
+schema, to make sure all tables are in a single schema (recommended).
+
+This example shows how to set a single schema "cdm54" via the SQLAlchemy engine:
+```python
+from omop_cdm.constants import CDM_SCHEMA, VOCAB_SCHEMA
+from sqlalchemy import create_engine
+
+
+OMOP_SCHEMA = "cdm54"
+
+schema_map = {
+    CDM_SCHEMA: OMOP_SCHEMA,
+    VOCAB_SCHEMA: OMOP_SCHEMA,
+}
+
+engine = create_engine("postgresql://postgres@localhost/mydb")
+engine = engine.execution_options(schema_translate_map=schema_map)
+```
+
+## Regular vs Dynamic CDM
+
+The CDM table definitions of a particular CDM release can be imported in two different
+ways.
+
+### Regular
+To use a standard set of CDM tables, you can import the module containing the
+definitions as follows:
+```python
+from omop_cdm.regular import cdm54
+```
+In this regular fashion, all tables are already bound to a SQLAlchemy
+`DeclarativeBase` class. This approach leaves fewer options for CDM modification,
+but is slightly simpler to use.
+
+E.g. once the engine is defined, creating all the tables in a database can be done
+by simply running the following:
+
+```python
+with engine.begin() as conn:
+    cdm54.Base.metadata.create_all(bind=conn)
+```
+
+### Dynamic
+
+Alternatively, you can use the dynamic CDM definitions. Here tables have not been bound to
+a SQLAlchemy `DeclarativeBase` class yet, but are defined as regular classes, which can
+then be used as [mixins](https://docs.sqlalchemy.org/en/20/orm/declarative_mixins.html).
+
+To use these, the classes must first be bound to a `Base`:
+```python
+from omop_cdm.dynamic import cdm54
+from sqlalchemy.orm import DeclarativeBase
+
+
+class Base(DeclarativeBase):
+    pass
+
+
+class Person(cdm54.BasePersonCdm54, Base):
+    pass
+
+# Etc. for all other tables
+```
+This approach allows for the greatest customization possibilities.
+
+Additionally, the dynamic version includes two additional tables which can optionally
+be added to your CDM. These are the `StemTable` (intermediate table for mapping purposes)
+and the `StcmVersion` table (to allow versioning of STCM source vocabulary mappings).
+
+## Legacy tables
+
+Apart from the standard CDM tables, the following legacy table classes can be
+added to your CDM by binding them to your `DeclarativeBase`:
+
+- AttributeDefinition
+- Cohort
+- CohortAttribute
+- CohortDefinition
+
+These tables have been part of the standard CDM in previous releases, but have since been moved
+to the results schema or removed entirely.
+
+They can be imported from `omop_cdm.dynamic.legacy`.
+
+## Customizing the CDM
+
+> **_NOTE:_** When adding or replacing columns, you can determine the position of the column in the table by setting
+a ``sort_order`` value inside ``mapped_column()``. The default CDM table columns have a ``sort_order``
+value pre-assigned with increments of 100 (column1 = 100, column2 = 200, etc.). This only works for tables
+> imported from the dynamic module.
+
+### New columns
+If desired, you can add custom fields to existing CDM tables.
+For dynamic tables, define the table class as normal, but add additional `mapped_column` fields with the
+data types of choice. E.g. to add an integer column to the person table:
+
+```python
+class Person(BasePersonCdm54, Base):
+    favorite_number: Mapped[Optional[int]] = mapped_column(Integer)
+```
+
+For tables from the regular modules, use the following syntax:
+```python
+Person.favorite_number = Column(Integer, nullable=True)
+```
+
+
+### Replace columns
+
+> **_NOTE:_** Replacing columns is only possible with dynamic table definitions.
+
+Although not recommended, you can replace existing CDM table columns with a different type of column.
+E.g. removing the character limit of the `ethnicity_source_value` column in the person table, by replacing
+it with a `Text` data type, can be done as follows:
+
+```python
+from typing import Optional
+
+from omop_cdm.dynamic import cdm54 as cdm
+from sqlalchemy import Text
+from sqlalchemy.orm import Mapped, mapped_column
+
+
+class Person(cdm.BasePersonCdm54, Base):
+    ethnicity_source_value: Mapped[Optional[str]] = mapped_column(Text)
+```
+
+It's important to note that replacing columns can only be done if the replacement
+doesn't break relationships with other tables.
+For example, replacing the `Integer` `person_id` column with `BigInteger` in the person table
+is possible. Replacing it with a column of type ``Text`` is not, as it breaks FK relationships
+that other CDM tables have with this field.
+
+### Replace whole table
+
+> **_NOTE:_** Replacing tables is only possible with dynamic table definitions.
+
+Instead of adding or replacing individual columns, it's also possible to replace an entire table.
+To do that, remove the inherited table base class from your table class and define all fields yourself.
+E.g. for the person table:
+
+```python
+from omop_cdm.constants import CDM_SCHEMA
+
+class Person(Base):
+    __tablename__ = "person"
+    __table_args__ = {"schema": CDM_SCHEMA}
+    # Define all columns here
+```
+
+Just like with modifying individual columns, this will only work if no violations occur in relationships
+with other CDM tables.
+
+Add new tables
+--------------
+In addition to the default CDM tables, you can also add your own custom tables to the model.
+
+By adding these tables to the same `DeclarativeBase` as the regular tables, they will become part
+of the ORM. For example:
+
+```python
+from omop_cdm.constants import CDM_SCHEMA
+from sqlalchemy import ForeignKey, Boolean
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+
+
+class Narcissus(Base):
+    __tablename__ = 'narcissus'
+    __table_args__ = {'schema': CDM_SCHEMA}
+
+    person_id: Mapped[int] = mapped_column(ForeignKey('cdm_schema.person.person_id'))
+    loved_by_narcissus: Mapped[bool] = mapped_column(Boolean, default=False)
+
+    person: Mapped["Person"] = relationship("Person")
+```
+
+### Schema name
+When adding your own tables, it's a good practice to specify a schema name via ``__table_args__``
+(see example above). If the schema will always be the same, there is no harm in hard coding the name.
+Otherwise, it's better to provide the schema placeholder name, and let the runtime schema name be
+determined by your `schema_translate_map`.
diff --git a/noxfile.py b/noxfile.py
@@ -0,0 +1,40 @@
+import nox  # type: ignore
+
+nox.options.sessions = [
+    "tests",
+    "lint",
+]
+
+python = [
+    "3.8",
+    "3.9",
+    "3.10",
+    "3.11",
+    "3.12",
+]
+
+
+@nox.session(python=python)
+def tests(session: nox.Session):
+    """Run pytest + code coverage."""
+    session.run("poetry", "install", external=True)
+    session.run("pytest", "--cov-report", "term-missing", "--cov=src")
+
+
+@nox.session(reuse_venv=True, name="format")
+def format_all(session: nox.Session):
+    """Format codebase with ruff."""
+    session.run("poetry", "install", "--only", "dev", external=True)
+    session.run("ruff", "format")
+    # format imports according to isort via ruff check
+    session.run("ruff", "check", "--select", "I", "--fix")
+
+
+@nox.session(reuse_venv=True)
+def lint(session: nox.Session):
+    """Run ruff linter."""
+    session.run("poetry", "install", "--only", "dev", external=True)
+    # Run the ruff linter
+    session.run("ruff", "check")
+    # Check if any code requires formatting via ruff format
+    session.run("ruff", "format", "--diff")