Skip to content

Commit

Permalink
Merge pull request #79 from x-atlas-consortia/neo4jv5
Browse files Browse the repository at this point in the history
Neo4jv5
  • Loading branch information
yuanzhou authored Apr 15, 2024
2 parents c935d60 + 80bd722 commit f57ec77
Show file tree
Hide file tree
Showing 11 changed files with 968 additions and 51 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,7 @@ docker/ubkg-api/BUILD
BUILD

**/__pycache__

/tests/*/*.out
/src/cells_index/*.csv
/src/cells_index/*.tsv
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,17 @@ If you are modifying code only in hs-ontology-api, you will only need
to use the PyPy package version of ubkg-api. The package is included in the requirements.txt file of this repo.

If you need to modify both the hs-ontology-api and ubkg-api in concert, you will
need to work with a local instance of the ubkg-api. This is possible by doing the following:
1. Check out a branch of ubkg-api.
2. Configure the local branch of ubkg-api, similarly to the local instance of hs-ontology-api.
3. Start the local instance of ubkg-api.
4. In the virtual environment for hs-ontology-api, install the local instance of ubkg-api using pip with the **-e** flag. This will override the pointer to the ubkg-api package.

``pip install -e path/to/local/ubkg/repo``
need to work with a local or branch instance of the ubkg-api. This is possible by doing the following:
1. If your working ubkg-api instance has been committed to a branch, you can point to the branch instance in requirements.txt with a command such as ``git+https://github.com/x-atlas-consortia/ubkg-api.git@<YOUR BRANCH>``
2. Check out a branch of ubkg-api.
2. Configure the app.cfg file of the local branch of ubkg-api to connect to the appropriate UBKG instance.
3. In the virtual environment for hs-ontology-api, install an editable local instance of ubkg-api. Two ways to do this:
a. ``pip install -e path/to/local/ubkg-api/repo``
b. If using PyCharm, in the **Python Packages** tab,
1) Click **Add Package**.
2) Navigate to the root of the ubkg-api repo.
3) Indicate that the package is editable.
4. Because ubkg-api has a PyPI TOML file, any of the aforementioned commands will compile a local package and override the pointer to the ubkg-api package.

## Connecting to the local instance of hs-ontology-api
For URLs that execute endpoints in your local instance, use the values indicated in the **main.py** script, in the section prefaced with the comment `For local development/testing`:
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.4.11
2.0.0
4 changes: 2 additions & 2 deletions hs-ontology-api-spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ openapi: 3.0.3
info:
title: HubMAP/SenNet Ontology API (hs-ontology-api)
description: The HuBMAP/SenNet Ontology API contains endpoints for querying a [UBKG](https://ubkg.docs.xconsortia.org/) instance with content from the [HuBMAP/SenNet context](https://ubkg.docs.xconsortia.org/contexts/#hubmapsennet-context). The hs-ontology-api imports the [ubkg-api](https://smart-api.info/ui/96e5b5c0b0efeef5b93ea98ac2794837), which encapsulates both basic connectivity to a UBKG instance and generic endpoint code.
version: 1.4.11
version: 2.0.0
contact:
name: GitHub repository
url: https://github.com/x-atlas-consortia/hs-ontology-api
Expand Down Expand Up @@ -1894,4 +1894,4 @@ components:
schema:
type: string
description: name of schema
example: imc3d
example: imc3d
16 changes: 10 additions & 6 deletions src/hs_ontology_api/cypher/celltypedetail.cypher
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@ CALL
// The calling function in neo4j_logic.py will replace $ids.
WITH [$ids] AS ids
OPTIONAL MATCH (pCL:Concept)-[:CODE]->(cCL:Code) WHERE cCL.SAB='CL' AND CASE WHEN ids[0]<>'' THEN ANY(id in ids WHERE cCL.CODE=id) ELSE 1=1 END RETURN DISTINCT pCL.CUI AS CLCUI
}
// APRIL 2024 Bug fix to use CodeID instead of CODE for cases of leading zeroes in strings.
OPTIONAL MATCH (pCL:Concept)-[:CODE]->(cCL:Code)
WHERE CASE WHEN ids[0]<>'' THEN ANY(id in ids WHERE cCL.CodeID='CL:'+id) ELSE 1=1 END RETURN DISTINCT pCL.CUI AS CLCUI}

CALL
{
Expand Down Expand Up @@ -54,13 +55,16 @@ ORDER BY CLID
UNION

//CL-HGNC mappings via HRA
// APRIL 2024 - HRA changed "has_marker_component" to "characterized_by"

//HGNC ID
WITH CLCUI
OPTIONAL MATCH (cCL:Code)<-[:CODE]-(pCL:Concept)-[:has_marker_component]->(pGene:Concept)-[:CODE]->(cGene:Code)-[r]->(tGene:Term)
OPTIONAL MATCH (cCL:Code)<-[:CODE]-(pCL:Concept)-[:characterized_by]->(pGene:Concept)-[:CODE]->(cGene:Code)-[r]->(tGene:Term)
WHERE pCL.CUI=CLCUI AND cGene.SAB='HGNC' AND r.CUI=pGene.CUI AND cCL.SAB='CL' AND type(r) IN ['ACR','PT']
RETURN distinct cCL.CodeID as CLID, 'cell_types_genes' as ret_key, cGene.CodeID + '|' + apoc.text.join(COLLECT(tGene.name),'|') AS ret_value
ORDER BY CLID, cGene.CodeID + '|' + apoc.text.join(COLLECT(tGene.name),'|')
WITH COLLECT(tGene.name) AS tgene_names, cGene.CodeID AS cgene_codeid, cCL.CodeID AS ccl_codeid
WITH distinct ccl_codeid AS CLID, 'cell_types_genes' AS ret_key, cgene_codeid+'|'+apoc.text.join(tgene_names,'|') AS ret_value
RETURN CLID, ret_key, ret_value
ORDER BY CLID, ret_value

UNION

Expand Down Expand Up @@ -110,4 +114,4 @@ map['cell_types_definition'] AS cell_types_definition,
map['cell_types_genes'] AS cell_types_genes,
map['cell_types_organ'] AS cell_types_organs

order by CLID
order by CLID
36 changes: 24 additions & 12 deletions src/hs_ontology_api/cypher/genedetail.cypher
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,9 @@ ORDER BY hgnc_id,ret_key
UNION
//Cell types - CL Codes
// APRIL 2024 - HRA changed "has_marker_component" to "characterized_by"
WITH GeneCUI
OPTIONAL MATCH (cGene:Code)<-[:CODE]-(pGene:Concept)-[:inverse_has_marker_component]->(pCL:Concept)-[:CODE]->(cCL:Code)-[rCL]->(tCL:Term) WHERE pGene.CUI=GeneCUI AND cGene.SAB='HGNC' AND cCL.SAB='CL' AND rCL.CUI=pCL.CUI RETURN toInteger(cGene.CODE) AS hgnc_id, 'cell_types_code' AS ret_key, cCL.CodeID AS ret_value
OPTIONAL MATCH (cGene:Code)<-[:CODE]-(pGene:Concept)-[:inverse_characterized_by]->(pCL:Concept)-[:CODE]->(cCL:Code)-[rCL]->(tCL:Term) WHERE pGene.CUI=GeneCUI AND cGene.SAB='HGNC' AND cCL.SAB='CL' AND rCL.CUI=pCL.CUI RETURN toInteger(cGene.CODE) AS hgnc_id, 'cell_types_code' AS ret_key, cCL.CodeID AS ret_value
ORDER BY hgnc_id,ret_key,ret_value
UNION
Expand All @@ -87,27 +88,34 @@ UNION
// The preferred term will be the term of type PT; if there is no PT, then any of the others of type PT_SAB will do.
// First, order the preferred terms by whether they are the PT or a PT_SAB.
// APRIL 2024 - HRA changed the label from "has_marker_component" to "characterized_by"
WITH GeneCUI
CALL{
WITH GeneCUI
OPTIONAL MATCH (cGene:Code)<-[:CODE]-(pGene:Concept)-[:inverse_has_marker_component]->(pCL:Concept)-[:CODE]->(cCL:Code)-[rCL]->(tCL:Term) WHERE pGene.CUI=GeneCUI AND cGene.SAB='HGNC' AND cCL.SAB='CL' AND rCL.CUI=pCL.CUI AND type(rCL) STARTS WITH 'PT' RETURN toInteger(cGene.CODE) AS hgnc_id, cCL.CodeID AS CLID, MIN(CASE WHEN type(rCL)='PT' THEN 0 ELSE 1 END) AS mintype order by hgnc_id,CLID,mintype
OPTIONAL MATCH (cGene:Code)<-[:CODE]-(pGene:Concept)-[:inverse_characterized_by]->(pCL:Concept)-[:CODE]->(cCL:Code)-[rCL]->(tCL:Term) WHERE pGene.CUI=GeneCUI AND cGene.SAB='HGNC' AND cCL.SAB='CL' AND rCL.CUI=pCL.CUI AND type(rCL) STARTS WITH 'PT' RETURN toInteger(cGene.CODE) AS hgnc_id, cCL.CodeID AS CLID, MIN(CASE WHEN type(rCL)='PT' THEN 0 ELSE 1 END) AS mintype order by hgnc_id,CLID,mintype
}

// Next, filter to either the PT or one of the PT_SABs.
// MARCH 2024 - WITH used in return to upgrade to v5 Cypher.
WITH hgnc_id, CLID, mintype
OPTIONAL MATCH (cCL:Code)-[rCL]->(tCL:Term)
where cCL.CodeID = CLID AND type(rCL) STARTS WITH 'PT'
AND CASE WHEN type(rCL)='PT' THEN 0 ELSE 1 END=mintype
return hgnc_id, 'cell_types_name' AS ret_key, CLID +'|'+ CASE WHEN tCL.name IS NULL THEN '' ELSE tCL.name END AS ret_value
WITH hgnc_id, 'cell_types_name' AS ret_key, CLID +'|'+ CASE WHEN tCL.name IS NULL THEN '' ELSE tCL.name END AS ret_value
RETURN hgnc_id, ret_key, ret_value

UNION

// Cell types - CL code|definition
// Definitions link to Concepts and multiple CL codes can match to the same concept; however, each CL code has a "preferred" CUI, identified by the CUI property of the relationship of any of the code's linked terms.

// MARCH 2024 - final WITH added to work with v5 Cypher
// APRIL 2024 - HRA changed "has_marker_component" to "characterized_by"
WITH GeneCUI
OPTIONAL MATCH (cGene:Code)<-[:CODE]-(pGene:Concept)-[:inverse_has_marker_component]->(pCL:Concept)-[:CODE]->(cCL:Code)-[rCL]->(tCL:Term),(pCL:Concept)-[:DEF]->(dCL:Definition) WHERE rCL.CUI=pCL.CUI AND pGene.CUI=GeneCUI AND cGene.SAB='HGNC' AND cCL.SAB='CL' AND dCL.SAB='CL' RETURN DISTINCT toInteger(cGene.CODE) AS hgnc_id,'cell_types_definition' as ret_key, cCL.CodeID + '|'+ dCL.DEF as ret_value
ORDER BY hgnc_id,cCL.CodeID + '|'+ dCL.DEF
OPTIONAL MATCH (cGene:Code)<-[:CODE]-(pGene:Concept)-[:inverse_characterized_by]->(pCL:Concept)-[:CODE]->(cCL:Code)-[rCL]->(tCL:Term),(pCL:Concept)-[:DEF]->(dCL:Definition) WHERE rCL.CUI=pCL.CUI AND pGene.CUI=GeneCUI AND cGene.SAB='HGNC' AND cCL.SAB='CL' AND dCL.SAB='CL'
WITH toInteger(cGene.CODE) AS hgnc_id,'cell_types_definition' as ret_key, cCL.CodeID + '|'+ dCL.DEF as ret_value
RETURN DISTINCT hgnc_id, ret_key, ret_value
ORDER BY hgnc_id, ret_value

UNION

Expand All @@ -118,32 +126,36 @@ UNION
// 3. Assigns UBERON codes as cross-references to AZ organ codes.
//
// To get organ information, map gene to cell type to organ location.
// APRIL 2024 - HRA changed "has_marker_component" to "characterized_by"
WITH GeneCUI
//First, get Azimuth Codes that are cross-referenced to CL codes. For the case of a CL code being cross-referenced to multiple AZ codes, only one AZ code gets the "preferred" cross-reference to the CL code; however, all AZ codes have a cross-reference to the CL code, so do not check on rAZ.CUI=pCL.CUI.
CALL
{WITH GeneCUI
OPTIONAL MATCH (cGene:Code)<-[:CODE]-(pGene:Concept)-[:inverse_has_marker_component]->(pCL:Concept)-[:CODE]->(cCL:Code)-[rCL]->(tCL:Term), (pCL:Concept)-[:CODE]->(cAZ:Code)-[rAZ]->(tAZ:Term) WHERE rCL.CUI=pCL.CUI AND pGene.CUI=GeneCUI AND cGene.SAB='HGNC' AND cCL.SAB='CL' AND cAZ.SAB='AZ' RETURN DISTINCT toInteger(cGene.CODE) AS hgnc_id,cCL.CodeID as CLID,cAZ.CodeID AS AZID}
OPTIONAL MATCH (cGene:Code)<-[:CODE]-(pGene:Concept)-[:inverse_characterized_by]->(pCL:Concept)-[:CODE]->(cCL:Code)-[rCL]->(tCL:Term), (pCL:Concept)-[:CODE]->(cAZ:Code)-[rAZ]->(tAZ:Term) WHERE rCL.CUI=pCL.CUI AND pGene.CUI=GeneCUI AND cGene.SAB='HGNC' AND cCL.SAB='CL' AND cAZ.SAB='AZ' RETURN DISTINCT toInteger(cGene.CODE) AS hgnc_id,cCL.CodeID as CLID,cAZ.CodeID AS AZID}
//Use the AZ codes to map to concepts that have located_in relationships with AZ organ codes. The AZ organ codes are cross-referenced to UBERON codes. Limit the located_in relationships to those from AZ.
CALL
{WITH AZID
OPTIONAL MATCH (cAZ:Code)<-[:CODE]-(pAZ:Concept)-[rAZUB:located_in]->(pUB:Concept)-[:CODE]->(cUB:Code)-[rUB:PT]->(tUB:Term) WHERE rAZUB.SAB='AZ' AND rUB.CUI=pUB.CUI AND cAZ.CodeID=AZID AND cUB.SAB='UBERON' RETURN cUB.CodeID+'*'+ tUB.name + '' as UBERONID
}

WITH hgnc_id, CLID,UBERONID
RETURN DISTINCT hgnc_id,'cell_types_organ' as ret_key, CLID+ '|' + apoc.text.join(COLLECT(DISTINCT UBERONID),",") AS ret_value
ORDER BY hgnc_id, CLID+ '|' + apoc.text.join(COLLECT(DISTINCT UBERONID),",")
WITH hgnc_id, 'cell_types_organ' as ret_key, CLID,UBERONID, CLID+ '|' + apoc.text.join(COLLECT(DISTINCT UBERONID),",") AS ret_value
RETURN DISTINCT hgnc_id, ret_key, ret_value
ORDER BY hgnc_id, ret_value

// Indicate the source of cell type information.
// APRIL 2024 - HRA changed "has_marker_component" to "characterized_by"
UNION
WITH GeneCUI
OPTIONAL MATCH (cGene:Code)<-[:CODE]-(pGene:Concept)-[:inverse_has_marker_component]->(pCL:Concept)-[:CODE]->(cCL:Code)-[rCL]->(tCL:Term) WHERE rCL.CUI=pCL.CUI AND pGene.CUI=GeneCUI AND cGene.SAB='HGNC' AND cCL.SAB='CL' RETURN DISTINCT toInteger(cGene.CODE) AS hgnc_id,'cell_types_source' as ret_key, cCL.CodeID + '|Human Reference Atlas' as ret_value
OPTIONAL MATCH (cGene:Code)<-[:CODE]-(pGene:Concept)-[:inverse_characterized_by]->(pCL:Concept)-[:CODE]->(cCL:Code)-[rCL]->(tCL:Term) WHERE rCL.CUI=pCL.CUI AND pGene.CUI=GeneCUI AND cGene.SAB='HGNC' AND cCL.SAB='CL' RETURN DISTINCT toInteger(cGene.CODE) AS hgnc_id,'cell_types_source' as ret_key, cCL.CodeID + '|Human Reference Atlas' as ret_value
ORDER BY hgnc_id,cCL.CodeID + '|Human Reference Atlas'

}

// APRIL 2024 bug fix check for null gene before calling fromlists

WITH hgnc_id, ret_key, COLLECT(ret_value) AS values
WITH hgnc_id,apoc.map.fromLists(COLLECT(ret_key),COLLECT(values)) AS map
WHERE hgnc_id IS NOT NULL
WITH hgnc_id,apoc.map.fromLists(COLLECT(ret_key),COLLECT(values)) AS map
RETURN hgnc_id,
map['approved_symbol'] AS approved_symbol,
map['approved_name'] AS approved_name,
Expand All @@ -159,4 +171,4 @@ map['cell_types_definition'] AS cell_types_code_definition,
map['cell_types_organ'] AS cell_types_codes_organ,
map['cell_types_source'] AS cell_types_codes_source

order by hgnc_id
order by hgnc_id
19 changes: 2 additions & 17 deletions src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ def make_flask_config():
return temp_flask_app.config


app = UbkgAPI(make_flask_config()).app
app = UbkgAPI(make_flask_config(), Path(__file__).absolute().parent.parent).app

app.register_blueprint(assaytype_blueprint)
app.register_blueprint(assayname_blueprint)
app.register_blueprint(datasets_blueprint)
Expand Down Expand Up @@ -68,22 +69,6 @@ def make_flask_config():
app.cells_client = OntologyCellsClient(cellsurl)


# Defining the /status endpoint in the ubkg_api package will cause 500 error
# Because the VERSION and BUILD files are not built into the package
@app.route('/status', methods=['GET'])
def api_status():
status_data = {
# Use strip() to remove leading and trailing spaces, newlines, and tabs
'version': (Path(__file__).absolute().parent.parent / 'VERSION').read_text().strip(),
'build': (Path(__file__).absolute().parent.parent / 'BUILD').read_text().strip(),
'neo4j_connection': False
}
is_connected = current_app.neo4jConnectionHelper.check_connection()
if is_connected:
status_data['neo4j_connection'] = True

return jsonify(status_data)

####################################################################################################
## For local development/testing
####################################################################################################
Expand Down
13 changes: 9 additions & 4 deletions src/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ubkg-api==1.4.0
Flask == 2.1.3
neo4j == 4.4
ubkg-api==2.1.1
Flask==2.1.3
neo4j==5.15.0

# for analysis of tabular data
pandas==1.5.0
Expand All @@ -12,4 +12,9 @@ numpy==1.23.5
Werkzeug==2.3.7

# Cells API client
hubmap-api-py-client==0.0.9
hubmap-api-py-client==0.0.9

# Test and analysis scripts
argparse==1.4.0
datatest==0.11.1
deepdiff==6.7.1
7 changes: 5 additions & 2 deletions src/uwsgi.ini
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,12 @@ module = wsgi:application
# Send logs to stdout instead of file so docker picks it up and writes to AWS CloudWatch
log-master=true

# Master with 2 worker process (based on CPU number)
# Master with 4 worker process (based on CPU number)
master = true
processes = 2
processes = 4

# Enable multithreading
enable-threads = true

# Use http socket for integration with nginx running on the same machine
socket = localhost:5000
Expand Down
Loading

0 comments on commit f57ec77

Please sign in to comment.