Skip to content

Commit

Permalink
Merge pull request #2 from thehyve/first-version
Browse files Browse the repository at this point in the history
First version of the CDM
  • Loading branch information
Spayralbe authored Apr 15, 2024
2 parents 1f88639 + bcdbb1b commit d7cb7c6
Show file tree
Hide file tree
Showing 68 changed files with 11,499 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<!---
Thank you for your soon-to-be pull request. Before you submit this, please
double check to make sure that you've added an entry to CHANGELOG.md.
-->
15 changes: 15 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: Ruff
on: [push, pull_request]
jobs:
ruff:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
- name: Install Poetry
uses: snok/install-poetry@v1
- name: Install nox
run: pip install nox
- name: Run ruff linter via nox session
run: nox -s lint
32 changes: 32 additions & 0 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# This workflow will install Python dependencies, run tests with a range of Python versions

name: tests

on: [push, pull_request]

jobs:
tests:
runs-on: ubuntu-latest

strategy:
matrix:
python-version: [
'3.8',
'3.9',
'3.10',
'3.11',
'3.12',
]

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install Poetry
uses: snok/install-poetry@v1
- name: Install package and dependencies
run: poetry install
- name: Test with pytest
run: poetry run pytest -vs
19 changes: 19 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.3.2
hooks:
# Run the linter.
- id: ruff
args: [ --fix ]
# Run the formatter.
- id: ruff-format
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Changelog

## v0.1.0
First release.
46 changes: 46 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# omop-cdm

omop-cdm is a Python package that contains SQLAlchemy declarative table definitions of several
versions of the [OHDSI OMOP CDM](https://ohdsi.github.io/CommonDataModel/).

## Installation

omop-cdm requires Python >= 3.8.

Install from PyPI:
```shell
pip install omop-cdm
```

## Usage

See [User documentation](docs/README.md)

## Supported databases
The omop-cdm table definitions are tested to be compatible with PostgreSQL.

Though not officially supported, omop-cdm doesn't use postgres-specific features
of SQLAlchemy, so it can likely be used for other database types as well.

## CDM versions
omop-cdm contains table defintions for the following CDM versions:
- CDM 5.4
- CDM 5.3.1
- CDM 6.0.0 ([not recommended](https://ohdsi.github.io/CommonDataModel/cdm60.html#NOTE_ABOUT_CDM_v60))

## Development

### Setup steps

- Make sure [Poetry](https://python-poetry.org/docs/#installation) is installed.
- Install the project and dependencies via `poetry install`.
- Set up the pre-commit hook scripts via `poetry run pre-commit install`.

### Nox sessions

Several developer actions (e.g. run tests, code format, lint) are available
via [nox](https://nox.thea.codes/en/stable/) sessions.
For a complete list, run:
```shell
nox --list
```
201 changes: 201 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# Usage

omop-cdm provides SQLAlchemy declarative table definitions that can be used
to interact with an OMOP CDM via Python. This can be to more easily create all tables in
your database, but also to use them for data manipulation
(see [SQLAlchemy data manipulation](https://docs.sqlalchemy.org/en/20/tutorial/orm_data_manipulation.html)).

## Preparation

Before creating the OMOP CDM tables, the following must be available:

- Target database
- Schema (or schemas) in which to create the tables

## Schemas

omop-cdm uses placeholders for the schema names, which are intended to be
replaced by the desired schema names. This can be done via the `schema_translate_map`
property of SQLAlchemy. By default, the vocabulary tables have a different schema
than the other tables, but both placeholders can be mapped to the same target
schema, to make sure all tables are in a single schema (recommended).

This example shows how to set a single schema "cdm54" via the SQLAlchemy engine:
```python
from omop_cdm.constants import CDM_SCHEMA, VOCAB_SCHEMA
from sqlalchemy import create_engine


OMOP_SCHEMA = "cdm54"

schema_map = {
CDM_SCHEMA: OMOP_SCHEMA,
VOCAB_SCHEMA: OMOP_SCHEMA,
}

engine = create_engine("postgresql://postgres@localhost/mydb")
engine = engine.execution_options(schema_translate_map=schema_map)
```

## Regular vs Dynamic CDM

The CDM table definitions of a particular CDM release can be imported in two different
ways.

### Regular
To use a standard set of CDM tables, you can import the module containing the
definitions as follows:
```python
from omop_cdm.regular import cdm54
```
In this regular fashion, all tables are already bound to a SQLAlchemy
`DeclarativeBase` class. This approach leaves fewer options for CDM modification,
but is slightly simpler to use.

E.g. once the engine is defined, creating all the tables in a database can be done
by simply running the following:

```python
with engine.begin() as conn:
cdm54.Base.metadata.create_all(bind=conn)
```

### Dynamic

Alternatively, you can use the dynamic CDM definitions. Here tables have not been bound to
a SQLAlchemy `DeclarativeBase` class yet, but are defined as regular classes, which can
then be used as [mixins](https://docs.sqlalchemy.org/en/20/orm/declarative_mixins.html).

To use these, the classes must first be bound to a `Base`:
```python
from omop_cdm.dynamic import cdm54
from sqlalchemy.orm import DeclarativeBase


class Base(DeclarativeBase):
pass


class Person(cdm54.BasePersonCdm54, Base):
pass

# Etc. for all other tables
```
This approach allows for the greatest customization possibilities.

Additionally, the dynamic version includes two additional tables which can optionally
be added to your CDM. These are the `StemTable` (intermediate table for mapping purposes)
and the `StcmVersion` table (to allow versioning of STCM source vocabulary mappings).

## Legacy tables

Apart from the standard CDM tables, the following legacy table classes can be
added to your CDM by binding them to your `DeclarativeBase`:

- AttributeDefinition
- Cohort
- CohortAttribute
- CohortDefinition

These tables have been part of the standard CDM in previous releases, but have since been moved
to the results schema or removed entirely.

They can be imported from `omop_cdm.dynamic.legacy`.

## Customizing the CDM

> **_NOTE:_** When adding or replacing columns, you can determine the position of the column in the table by setting
a ``sort_order`` value inside ``mapped_column()``. The default CDM table columns have a ``sort_order``
value pre-assigned with increments of 100 (column1 = 100, column2 = 200, etc.). This only works for tables
> imported from the dynamic module.
### New columns
If desired, you can add custom fields to existing CDM tables.
For dynamic tables, define the table class as normal, but add additional `mapped_column` fields with the
data types of choice. E.g. to add an integer column to the person table:

```python
class Person(BasePersonCdm54, Base):
favorite_number: Mapped[Optional[int]] = mapped_column(Integer)
```

For tables from the regular modules, use the following syntax:
```python
Person.favorite_number = Column(Integer, nullable=True)
```


### Replace columns

> **_NOTE:_** Replacing columns is only possible with dynamic table definitions.
Although not recommended, you can replace existing CDM table columns with a different type of column.
E.g. removing the character limit of the `ethnicity_source_value` column in the person table, by replacing
it with a `Text` data type, can be done as follows:

```python
from typing import Optional

from omop_cdm.dynamic import cdm54 as cdm
from sqlalchemy import Text
from sqlalchemy.orm import Mapped, mapped_column


class Person(cdm.BasePersonCdm54, Base):
ethnicity_source_value: Mapped[Optional[str]] = mapped_column(Text)
```

It's important to note that replacing columns can only be done if the replacement
doesn't break relationships with other tables.
For example, replacing the `Integer` `person_id` column with `BigInteger` in the person table
is possible. Replacing it with a column of type ``Text`` is not, as it breaks FK relationships
that other CDM tables have with this field.

### Replace whole table

> **_NOTE:_** Replacing tables is only possible with dynamic table definitions.
Instead of adding or replacing individual columns, it's also possible to replace an entire table.
To do that, remove the inherited table base class from your table class and define all fields yourself.
E.g. for the person table:

```python
from omop_cdm.constants import CDM_SCHEMA

class Person(Base):
__tablename__ = "person"
__table_args__ = {"schema": CDM_SCHEMA}
# Define all columns here
```

Just like with modifying individual columns, this will only work if no violations occur in relationships
with other CDM tables.

Add new tables
--------------
In addition to the default CDM tables, you can also add your own custom tables to the model.

By adding these tables to the same `DeclarativeBase` as the regular tables, they will become part
of the ORM. For example:

```python
from omop_cdm.constants import CDM_SCHEMA
from sqlalchemy import ForeignKey, Boolean
from sqlalchemy.orm import Mapped, mapped_column, relationship


class Narcissus(Base):
__tablename__ = 'narcissus'
__table_args__ = {'schema': CDM_SCHEMA}

person_id: Mapped[int] = mapped_column(ForeignKey('cdm_schema.person.person_id'))
loved_by_narcissus: Mapped[bool] = mapped_column(Boolean, default=False)

person: Mapped["Person"] = relationship("Person")
```

### Schema name
When adding your own tables, it's a good practice to specify a schema name via ``__table_args__``
(see example above). If the schema will always be the same, there is no harm in hard coding the name.
Otherwise, it's better to provide the schema placeholder name, and let the runtime schema name be
determined by your `schema_translate_map`.
40 changes: 40 additions & 0 deletions noxfile.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
import nox # type: ignore

nox.options.sessions = [
"tests",
"lint",
]

python = [
"3.8",
"3.9",
"3.10",
"3.11",
"3.12",
]


@nox.session(python=python)
def tests(session: nox.Session):
"""Run pytest + code coverage."""
session.run("poetry", "install", external=True)
session.run("pytest", "--cov-report", "term-missing", "--cov=src")


@nox.session(reuse_venv=True, name="format")
def format_all(session: nox.Session):
"""Format codebase with ruff."""
session.run("poetry", "install", "--only", "dev", external=True)
session.run("ruff", "format")
# format imports according to isort via ruff check
session.run("ruff", "check", "--select", "I", "--fix")


@nox.session(reuse_venv=True)
def lint(session: nox.Session):
"""Run ruff linter."""
session.run("poetry", "install", "--only", "dev", external=True)
# Run the ruff linter
session.run("ruff", "check")
# Check if any code requires formatting via ruff format
session.run("ruff", "format", "--diff")
Loading

0 comments on commit d7cb7c6

Please sign in to comment.