Skip to content

Commit

Permalink
Able to use either local or S3 remote vcf file (#378)
Browse files Browse the repository at this point in the history
* Able to use either local or S3 remote vcf file

* Improved pytest report

* Rebranded from S3_VCF_FILE_URL to VCF_FILE
  • Loading branch information
alanwilter authored Nov 9, 2021
1 parent 80eb096 commit 3262561
Show file tree
Hide file tree
Showing 10 changed files with 39 additions and 13 deletions.
6 changes: 6 additions & 0 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,9 @@ jobs:
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
run: docker-compose run -e APP_ENV=prod -e AWS_SECRET_ACCESS_KEY -e AWS_ACCESS_KEY_ID app pytest --color=yes

- name: Test with PyTest for S3
env:
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
run: docker-compose run -e APP_ENV=prod -e VCF_FILE="s3://phenopolis-vcf/August2019/merged2.vcf.gz" -e AWS_SECRET_ACCESS_KEY -e AWS_ACCESS_KEY_ID app pytest --color=yes -k test_variants
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -78,4 +78,3 @@ private.env

dc_dev.yml
/.mypy_cache/
*tbi
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,19 @@ A description of the code setup is available [here](code_setup.md).

## Setup using docker compose

Set the following environment variables in `private.env`:
Set the following environment variable in

* `public.env`:

```bash
VCF_FILE=...
```

Where `VCF_FILE` can be either a local file (e.g. `path/file.vcf.gz`) or a remote `S3` file (e.g. `s3://any_remote/file.vcf.gz` )

It's critical that the `VCF_FILE` has along its `tbi` file as well.

* Create `private.env` and add:

```bash
AWS_SECRET_ACCESS_KEY=....
Expand Down
2 changes: 1 addition & 1 deletion public.env
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ PH_DB_USER=phenopolis_api
PH_DB_PASSWORD=phenopolis_api
PH_DB_PORT=5432

S3_VCF_FILE_URL="s3://phenopolis-vcf/August2019/merged2.vcf.gz"
VCF_FILE=schema/small_demo.vcf.gz

MAIL_USERNAME=[email protected]

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ujson>=3.0,<3.1
python-dotenv>=0.14,<0.15
itsdangerous>=1.1,<1.2
bidict>=0.21,<0.22
cyvcf2>=0.20.9
cyvcf2>=0.30.12
boto3>=1.16.43

# for checks
Expand Down
Binary file added schema/small_demo.vcf.gz
Binary file not shown.
Binary file added schema/small_demo.vcf.gz.tbi
Binary file not shown.
3 changes: 2 additions & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import pytest
import os
from dotenv import load_dotenv
from views import application, APP_ENV, VERSION
from views.auth import ADMIN_USER, USER, DEMO_USER
Expand All @@ -8,7 +9,7 @@


def pytest_report_header(config):
return f">>> Version: {VERSION}, APP_ENV: {APP_ENV}"
return f">>>\tVersion: {VERSION}\n\tAPP_ENV: {APP_ENV}\n\tVCF_FILE: {os.getenv('VCF_FILE')}"


@pytest.fixture
Expand Down
17 changes: 15 additions & 2 deletions tests/test_variants.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@


def test_get_genotypes_exception():
# if this happens, something is out of sync between S3 VCF file and variant table in DB
# if this happens, something is out of sync between VCF file and variant table in DB
redirected_error = sys.stderr = StringIO()
exec('_get_genotypes("443", "10000")')
err = redirected_error.getvalue()
Expand All @@ -14,7 +14,7 @@ def test_get_genotypes_exception():

def test_variant(_demo):
"""
This tests S3 and VCF access via cvycf2
This tests VCF access via cvycf2
tests both for subset and entry not in DB, the real one is 14-76127655-C-T
res -> str
"""
Expand All @@ -35,6 +35,19 @@ def test_variant_web(_admin_client):
assert "[{'display': 'my:PH00008258'," in str(resp.json), "Check for 'my:..."


def test_variant_genotype_vcf(_admin_client):
resp = _admin_client.get("/variant/14-76156575-A-G")
assert resp.status_code == 200
assert len(resp.json[0]["genotypes"]["data"]) == 4, "Critical, VCF access not working"


def test_cyvcf2_S3(_admin_client):
from cyvcf2 import VCF

vcf_S3 = VCF("s3://3kricegenome/test/test.vcf.gz") # public VCF file
assert len(vcf_S3.raw_header) == 559362, "Critical, S3 access not working"


def test_missing_variant(_demo):
response = variant("chr45-1234567890112233-C-G")
assert response.status_code == 404
Expand Down
7 changes: 1 addition & 6 deletions views/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
from subprocess import Popen, STDOUT, PIPE
import psycopg2


# Options are: prod, dev, debug (default)
APP_ENV = os.getenv("APP_ENV", "debug")

Expand All @@ -34,11 +33,7 @@
if APP_ENV in ["prod"]:
ENV_LOG_FLAG = False

# in GH Workflow tests, private.env is not available so skip variant tests
try:
variant_file = VCF(os.getenv("S3_VCF_FILE_URL", "s3://phenopolis-vcf/August2019/merged2.vcf.gz"))
except OSError:
variant_file = None
variant_file = VCF(os.getenv("VCF_FILE"))


def _configure_logs():
Expand Down

0 comments on commit 3262561

Please sign in to comment.