diff --git a/.gitignore b/.gitignore index fadfe6db..099ec724 100644 --- a/.gitignore +++ b/.gitignore @@ -49,3 +49,6 @@ scripts/fresh_indices/fresh_indices.ini # The file holding the user's token, stream into the fresh_indices.sh script scripts/fresh_indices/token_holder + +examples/*.out +examples/.ipynb_checkpoints/**/* \ No newline at end of file diff --git a/examples/Parameter Search and Download Tutorial.ipynb b/examples/Parameter Search and Download Tutorial.ipynb new file mode 100644 index 00000000..2e93873e --- /dev/null +++ b/examples/Parameter Search and Download Tutorial.ipynb @@ -0,0 +1,190 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "51ba76c1", + "metadata": {}, + "source": [ + "# Parameter Search and Download Tutorial\n", + "\n", + "### Overview\n", + "The combination of the [RESTful parameterized search](https://docs.hubmapconsortium.org/index.html) and the [HuBMAP Command Line Transfer Tool](https://docs.hubmapconsortium.org/clt/index.html) provides for an easy way to programatically query HuBMAP data and download the results of the query.\n", + "\n", + "### Description\n", + "Below is an example of how to use the [RESTful parameterized search endpoint](https://docs.hubmapconsortium.org/index.html) to query for datasets with specific attributes and produce a manifest of datasets to download and how to use the manifest to download all of the data for the referenced Datasets. The parameterized search feature shown in this example is a simple query mechanism that allows quick querying of data via a single RESTful URL call where queried attributes are constrained to exact string matches of a limited set of attributes, where the query is an \"AND\" filtered query with all attribute matches as terms in the \"AND\" clause, for example the query `/param-search/datasets?status=Published&dataset_type=CODEX` will return all datasets that are \"Published AND a result of a CODEX assay\". If more complex queries are desired use the standard `/search` endpoint which is documented in the [HuBMAP Search API Endpoints](https://smart-api.info/ui/7aaf02b838022d564da776b03f357158).\n", + "\n", + "This example uses the Python Requests library to send the parameter search query and retrieve the results. If Requests hasn't been installed run `pip install requests` in the environment that this notbook is running in. A version of this example using the command line `curl` command can be found in the [Example Query and Download page](https://docs.hubmapconsortium.org/param-search/data-query-download-example.html)" + ] + }, + { + "cell_type": "markdown", + "id": "bc4ad256", + "metadata": {}, + "source": [ + "\n", + "### Example Query and Download\n", + "\n", + "The following query will return all CODEX (`dataset_type=CODEX`) Datasets run on a Keyence BZ-X800 machine (`metadata.metadata.acquisition_instrument_model=BZ-X800`) where tissue from a spleen was used (`origin_samples.organ=SP`). See the [RESTful parameterized search page](https://docs.hubmapconsortium.org/index.html) for further information on querying dataset, organ (`origin_samples.organ` represents the organ in the query and `SP` is the organ code (organ code list available [here](https://docs.hubmapconsortium.org/schema-sample.html#organ-attribute-values)) and dataset metadata fields.\n", + "\n", + "```\n", + "GET https://search.api.hubmapconsortium.org/v3/param-search/datasets?dataset_type=CODEX&metadata.metadata.acquisition_instrument_model=BZ-X800&origin_samples.organ=SP\n", + "```\n", + "\n", + "#### Producing a CLT manifest file\n", + "\n", + "As is, if this query is submitted via HTTP GET it will produce a json Response with an array of dataset objects which match the query. Adding the `produce-clt-manifest=true` option to this query will instead prduce a list of Dataset IDs pointing to the Datasets that match this query in a format that will be directly usable by the [HuBMAP Command Line Transfer Tool](.https://docs.hubmapconsortium.org/clt/index.html).\n", + "\n", + "```\n", + "GET https://search.api.hubmapconsortium.org/v3/param-search/datasets?dataset_type=CODEX&metadata.metadata.acquisition_instrument_model=BZ-X800&origin_samples.organ=SP&produce-clt-manifest=true\n", + "```\n", + "\n", + "#### Make the query request and get the manifest information\n", + "The code below, does the request as specified above with a lot of error checking" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0473e6d0", + "metadata": {}, + "outputs": [], + "source": [ + "#imports\n", + "import os\n", + "import requests\n", + "\n", + "#run the query, get the results, report any errors\n", + "query_url = 'https://search.api.hubmapconsortium.org/v3/param-search/datasets?dataset_type=CODEX&metadata.metadata.acquisition_instrument_model=BZ-X800&origin_samples.organ=SP&produce-clt-manifest=true'\n", + "try:\n", + " manifest_text = None\n", + " \n", + " #make request and grap the HTTP response code\n", + " response = requests.get(query_url)\n", + " response_code = response.status_code\n", + " \n", + " #per API docs, /param-search/ can send several response codes\n", + " #handle all of those cases\n", + " if response_code == 200:\n", + " manifest_text = response.text\n", + " #if the response size is > 10MB, a redirect to an S3 bucket is sent\n", + " #retrieve the manifext information from the S3 bucket\n", + " elif response_code == 303:\n", + " next_url = response.text\n", + " next_response = requests.get(next_url)\n", + " if next_response.status_code == 200:\n", + " manifest_text = next_response.text\n", + " else:\n", + " print(f\"Unable to retrieve from redirect {next_response.status_code}: {next_response.text}\")\n", + " print(f\"Redirect URL: {next_url}\")\n", + " #we'll get a 404 if the URL is wrong OR IF NO DATA HAS BEEN found when the produce-clt-manifest option has been used\n", + " elif response_code == 404:\n", + " print(\"Endpoint not found or no data matching this query.\")\n", + " #if the query times out\n", + " elif response_code == 504:\n", + " print(\"The query timed out after 30 seconds\")\n", + " else:\n", + " print(f\"Unable to retrieve query {response_code}: {response.text}\")\n", + "except Exception as err:\n", + " print(f\"An unexpected error occurred: {err}\")\n", + "\n", + "if not manifest_text is None:\n", + " print(\"Success\")\n", + "else:\n", + " print(\"Fail\")" + ] + }, + { + "cell_type": "markdown", + "id": "1cc5a4bc", + "metadata": {}, + "source": [ + "### Write the manifest file" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8ce54ede", + "metadata": {}, + "outputs": [], + "source": [ + "#if manifest information was retrieved write the manifext file otherwise print an error message\n", + "if not manifest_text is None:\n", + " fname = \"dataset_download_manifest.out\"\n", + " with open(fname, 'w') as file: \n", + " file.write(manifest_text)\n", + " print(f\"manifest file written at: {os.path.abspath(fname)}\")\n", + "else:\n", + " print(\"ERROR: No manifext information found. File not written\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "462e565d", + "metadata": {}, + "source": [ + "### Use the HuBMAP CLT to download the file" + ] + }, + { + "attachments": { + "globus-properties.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "id": "c303ede3", + "metadata": {}, + "source": [ + "To use the HuBMAP CLT tool to download the data for the datasets referenced in the generated mainifest file:\n", + "\n", + " - Install the Globus Connect Personal client and the HuBMAP CLT per the [HuBMAP CLT Setup Instructions](https://docs.hubmapconsortium.org/clt/install-hubmap-clt.html)\n", + " - Setup Note: A common issue arrises between the configuration of the GCP client and HuBMAP CLT. By default HuBMAP CLT stores files in the user's home directory under a directory called `hubmap-downloads`, so make sure to configure the GCP client by goint to \"Preferences\"-->\"Access\" and adding the `hubmap-downloads` directory in the user's home like (Example shown is Mac OS X):

\n", + " ![globus-properties.png](attachment:globus-properties.png)\n", + "

\n", + " - On the command line, change to the directory where the into the directory where the mainifest file was generated (printed in the last step), then log into HuBMAP Globus server using:\n", + " \n", + " ```\n", + " cd /my/directory/where/manifest/file/sits\n", + " hubmap-clt login\n", + " ```\n", + " Globus login screen will open in your default web browser. Follow the instructions to log in. For publicly available HuBMAP data any login will work (your institution, Google, GitHub, etc..).\n", + " - Download the data using the manifest file genrated above:\n", + " ```\n", + " hubmap-clt transfer dataset-manifest-for-download.out\n", + " ```\n", + "\n", + "Futher instructions on the usage of the HuBMAP CLT are available on the main [HuBMAP Command Line Transfer Tool page](https://docs.hubmapconsortium.org/clt/index.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11dc567f", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}