Skip to content

Commit

Permalink
Merge branch 'reduce-ontology-terms' into scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
paulzierep committed Jun 4, 2024
2 parents 0174ea4 + 4b87b03 commit 6b21e6c
Show file tree
Hide file tree
Showing 48 changed files with 58,629 additions and 8,552 deletions.
12 changes: 8 additions & 4 deletions .github/workflows/fetch_all_tools.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,11 @@ jobs:
python-version: ${{ matrix.python-version }}
- name: Install requirement
run: python -m pip install -r requirements.txt
- name: Run script
- name: Run script #needs PAT to access other repos
run: |
export GITHUB_API_KEY=${{ secrets.GH_API_TOKEN }}
bash ./bin/extract_all_tools_stepwise.sh "${{ matrix.subset }}"
env:
GITHUB_API_KEY: ${{ secrets.GH_API_TOKEN }}
- name: Commit all tools
# add or commit any changes in results if there was a change, merge with main and push as bot
run: |
Expand All @@ -60,10 +61,13 @@ jobs:
ref: main #pull latest code produced by job 1, not the revision that started the workflow (https://github.com/actions/checkout/issues/439)
- uses: actions/setup-python@v5
- name: Install requirement
run: python -m pip install -r requirements.txt
run: |
python -m pip install -r requirements.txt
sudo apt-get install jq
- name: Merge all tools
run: | #merge files with only one header -> https://stackoverflow.com/questions/16890582/unixmerge-multiple-csv-files-with-same-header-by-keeping-the-header-of-the-firs
awk 'FNR==1 && NR!=1{next;}{print}' results/repositories*.list_tools.tsv > results/all_tools.tsv
jq -s '.' results/repositories*.list_tools.json > results/all_tools.json
- name: Wordcloud and interactive table
run: |
bash ./bin/extract_all_tools_downstream.sh
Expand All @@ -76,4 +80,4 @@ jobs:
git add results
git status
git diff --quiet && git diff --staged --quiet || (git commit -m "fetch all tools bot - step merge")
git push
git push
11 changes: 9 additions & 2 deletions .github/workflows/filter_communities.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,17 @@ name: Filter community tools
on:
workflow_dispatch:

# the workflow it triggered when all_tools_tsv is changed
# the workflow it triggered when all tools are fetched
workflow_run:
workflows: ["Fetch all tools"]
types:
- completed

# the workflow it also triggered when the community definitions are changed
push:
paths:
- 'results/all_tools.tsv'
- 'data/communities**'
branches: ["main"]

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
Expand Down
9 changes: 5 additions & 4 deletions .github/workflows/static.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@
name: Deploy static content to Pages

on:
# the workflow is triggered when any of the results are changed
push:
paths:
- 'results/**'

# the workflow it triggered when the tools where filtered
workflow_run:
workflows: ["Filter community tools"]
types:
- completed

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
Expand Down
8 changes: 8 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Code of Conduct
===============

As part of the Galaxy Community, this project is committed to providing a
welcoming and harassment-free experience for everyone. We therefore expect
participants to abide by our Code of Conduct, which can be found at:

https://galaxyproject.org/community/coc/
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Galaxy Project

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
45 changes: 26 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,16 @@ Galaxy Tool Metadata Extractor


This tool automatically collects a table of all available Galaxy tools including their metadata. The created table
can be filtered to only show the tools relevant for a specific community. **Learn [how to add your community](#add-your-community)**.
can be filtered to only show the tools relevant for a specific community.

The tools performs the following steps:
Any Galaxy community can be added to this project and benefit from a dedicated interactive table that can be embedded into subdomains and website via an iframe. **Learn [how to add your community](https://training.galaxyproject.org/training-material//topics/dev/tutorials/community-tool-table/tutorial.html) in the dedicated GTN toturial**.

The interactive table benefits from EDAM annotations of the tools, this requires, that the tools are annotation via bio.tools.
**Learn [how to improve metadata for Galaxy tools using the bio.tools registry](https://training.galaxyproject.org/training-material//topics/dev/tutorials/tool-annotation/tutorial.html)**.

# Tool workflows

The tool performs the following steps:

- Parse tool GitHub repository from [Planemo monitor listed](https://github.com/galaxyproject/planemo-monitor)
- Check in each repo, their `.shed.yaml` file and filter for categories, such as metagenomics
Expand All @@ -22,8 +29,6 @@ The tools performs the following steps:
- Creates an interactive table for all tools: [All tools](https://galaxyproject.github.io/galaxy_tool_metadata_extractor/)
- Creates an interactive table for all registered communities, e.g. [microGalaxy](https://galaxyproject.github.io/galaxy_tool_metadata_extractor/microgalaxy/)



# Usage

## Prepare environment
Expand Down Expand Up @@ -90,28 +95,30 @@ The script will generate a TSV file with each tool found in the list of GitHub r
1. Run the extraction as explained before
2. (Optional) Create a text file with ToolShed categories for which tools need to be extracted: 1 ToolShed category per row ([example for microbial data analysis](data/microgalaxy/categories))
3. (Optional) Create a text file with list of tools to exclude: 1 tool id per row ([example for microbial data analysis](data/microgalaxy/tools_to_exclude))
4. (Optional) Create a text file with list of tools to really keep (already reviewed): 1 tool id per row ([example for microbial data analysis](data/microgalaxy/tools_to_keep))
3. (Optional) Create a TSV (tabular) file with tool status (1 tool suite per row) as 3 columns:
- ToolShed ids of tool suites (one per line)
- Boolean with True to keep and False to exclude
- Boolean with True if deprecated and False if not
[Example for microbial data analysis](data/microgalaxy/tools_to_keep_exclude.tsv)
4. Run the tool extractor script
```
$ python bin/extract_galaxy_tools.py \
--tools <Path to CSV file with all extracted tools> \
--filtered_tools <Path to output CSV file with filtered tools> \
--tools <Path to JSON file with all extracted tools> \
--ts-filtered-tools <Path to output TSV with tools filtered based on ToolShed category>
--filtered-tools <Path to output TSV with filtered tools based on ToolShed category and manual curation> \
[--categories <Path to ToolShed category file>] \
[--excluded <Path to excluded tool file category file>]\
[--keep <Path to to-keep tool file category file>]
[--status <Path to a TSV file with tool status - 3 columns: ToolShed ids of tool suites, Boolean with True to keep and False to exclude, Boolean with True if deprecated and False if not>]
```
## Development
To make a test run of the tool to check its functionalities follow [Usage](#Usage) to set-up the environnement and the API key, then run
## Add your community
In order to add your community you need to:
- Fork this repository.
- Add a folder for your community in `data/communities`.
- Add at least the file `categories`.
- Add all `categories` that are relevant to initially filter the tools for your community. Possible categories are listed here [Galaxy toolshed](https://toolshed.g2.bx.psu.edu/).
- Make a pull request to add your community.
- The workflow will run every sunday, so on the next monday, your community table should be added to `results/<your community name>`
```bash
bash ./bin/extract_all_tools_test.sh test.list
```

This runs the tool, but only parses the test repository [Galaxy-Tool-Metadata-Extractor-Test-Wrapper](https://github.com/paulzierep/Galaxy-Tool-Metadata-Extractor-Test-Wrapper)
24 changes: 21 additions & 3 deletions bin/create_interactive_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import pandas as pd

# TODO maybe allow comunities to modify
# TODO maybe allow communities to modify
COLUMNS = [
"Expand",
"Galaxy wrapper id",
Expand All @@ -18,25 +18,43 @@
"EDAM topic",
"Description",
"bio.tool description",
"biii",
"Status",
"Source",
"ToolShed categories",
"ToolShed id",
"Galaxy wrapper owner",
"Galaxy wrapper source",
"Galaxy wrapper parsed folder",
]

COLUMNS_TO_DROP = [
"Reviewed",
"To keep",
]


# COLUMNS_TO_ADD = [
# "Expand"
# ]


def generate_table(
tsv_path: str,
template_path: str,
output_path: str,
) -> None:
df = pd.read_csv(tsv_path, sep="\t").assign(Expand=lambda df: "").fillna("")
df = pd.read_csv(tsv_path, sep="\t")
df.insert(0, "Expand", None) # the column where the expand button is shown
df = df.fillna("")

if "To keep" in df.columns:
df["To keep"] = df["To keep"].replace("", True)
df = df.query("`To keep`")
df = df.loc[:, COLUMNS].reindex(columns=COLUMNS)

df = df.drop(COLUMNS_TO_DROP, axis=1)

# df = df.loc[:, COLUMNS].reindex(columns=COLUMNS)
table = df.to_html(border=0, table_id="dataframe", classes=["display", "nowrap"], index=False)

with open(template_path) as template_file:
Expand Down
3 changes: 2 additions & 1 deletion bin/extract_all_tools.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ mkdir -p 'results/'
python bin/extract_galaxy_tools.py \
extractools \
--api $GITHUB_API_KEY \
--all_tools 'results/all_tools.tsv'
--all-tools 'results/all_tools.tsv' \
--all-tools-json 'results/all_tools.json'

python bin/create_interactive_table.py \
--table "results/all_tools.tsv" \
Expand Down
22 changes: 18 additions & 4 deletions bin/extract_all_tools_stepwise.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,25 @@

mkdir -p 'results/'

output="results/${1}_tools.tsv"
tsv_output="results/${1}_tools.tsv"
json_output="results/${1}_tools.json"

python bin/extract_galaxy_tools.py \
if [[ $1 =~ "01" ]]; then
python bin/extract_galaxy_tools.py \
extractools \
--api $GITHUB_API_KEY \
--all_tools $output \
--planemorepository $1
--all-tools $tsv_output \
--all-tools-json $json_output \
--planemo-repository-list $1
else
python bin/extract_galaxy_tools.py \
extractools \
--api $GITHUB_API_KEY \
--all-tools $tsv_output \
--all-tools-json $json_output \
--planemo-repository-list $1 \
--avoid-extra-repositories
fi



8 changes: 5 additions & 3 deletions bin/extract_all_tools_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,14 @@

mkdir -p 'results/'

output="results/${1}_tools.tsv"
tsv_output="results/${1}_tools.tsv"
json_output="results/${1}_tools.json"

python bin/extract_galaxy_tools.py \
extractools \
--api $GITHUB_API_KEY \
--all_tools $output \
--planemorepository $1 \
--all-tools $tsv_output \
--all-tools-json $json_output \
--planemo-repository-list $1 \
--test

Loading

0 comments on commit 6b21e6c

Please sign in to comment.