Skip to content

Commit

Permalink
Merge branch 'main' into update-docs
Browse files Browse the repository at this point in the history
  • Loading branch information
paulzierep authored Jan 24, 2024
2 parents 727cdda + 9d82d32 commit 56268bc
Show file tree
Hide file tree
Showing 28 changed files with 2,335 additions and 4,869 deletions.
8 changes: 5 additions & 3 deletions .github/workflows/fetch_all_tools.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,11 @@ jobs:
- uses: actions/setup-python@v5
- name: Install requirement
run: python -m pip install -r requirements.txt
- name: Run script
run: |
cat results/repositories*.list_tools.tsv > results/all_tools.tsv
- name: Merge all tools
run: | #merge files with only one header -> https://stackoverflow.com/questions/16890582/unixmerge-multiple-csv-files-with-same-header-by-keeping-the-header-of-the-firs
awk 'FNR==1 && NR!=1{next;}{print}' results/repositories*.list_tools.tsv > results/all_tools.tsv
- name: Wordcloud and interactive table
run: |
bash ./bin/extract_all_tools_downstream.sh
- name: Commit all tools
# add or commit any changes in results if there was a change, merge with main and push as bot
Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/filter_communities.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ on:
push:
paths:
- 'results/all_tools_tsv'
branches: ["main"]

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
Expand All @@ -34,5 +33,8 @@ jobs:
run: |
git config user.name github-actions
git config user.email [email protected]
git diff --quiet || (git add results && git commit -m "filter communities bot")
git pull --no-rebase -s recursive -X ours
git add results
git status
git diff --quiet && git diff --staged --quiet || (git commit -m "fetch all tools bot - step filter")
git push
5 changes: 2 additions & 3 deletions .github/workflows/static.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,10 @@
name: Deploy static content to Pages

on:
# the workflow is triggered only when results are changed
# the workflow is triggered when any of the results are changed
push:
paths:
- 'results'
branches: ["main"]
- 'results/**'


# Allows you to run this workflow manually from the Actions tab
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,12 @@ Galaxy Tool Metadata Extractor

![plot](docs/images/Preprint_flowchart.png)


This tool automatically collects a table of all available Galaxy tools including their metadata. The created table
can be filtered to only show the tools relevant for a specific community. **Learn [how to add your community](#add-your-community)**.

The tools performs the following steps:

- Parse tool GitHub repository from [Planemo monitor listed](https://github.com/galaxyproject/planemo-monitor)
- Check in each repo, their `.shed.yaml` file and filter for categories, such as metagenomics
- Extract metadata from the `.shed.yaml`
Expand All @@ -21,6 +23,7 @@ The tools performs the following steps:
- Creates an interactive table for all registered communities, e.g. [microGalaxy](https://galaxyproject.github.io/galaxy_tool_metadata_extractor/microgalaxy/)



# Usage

## Prepare environment
Expand Down Expand Up @@ -101,6 +104,7 @@ The script will generate a TSV file with each tool found in the list of GitHub r
```
## Add your community
In order to add your community you need to:
Expand Down
83 changes: 83 additions & 0 deletions bin/create_wordcloud.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#!/usr/bin/env python

import argparse

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from PIL import Image
from wordcloud import WordCloud


def get_wordcloud(community_tool_path: str, mask_figure: str, stats_column: str, wordcloud_output_path: str) -> None:
"""
Generate a wordcloud based on the counts for each Galaxy wrapper id
:param community_tool_path: Dataframe that must
have the columns "Galaxy wrapper id" and `stats_column`
:param mask_figure: a figure that is used to render the wordcloud
E.g. a nice shape to highlight your community
:param stats_column: Column name of the
column with usage statistics in the table
:param wordcloud_output_path: Path to store the wordcloud
"""

community_tool_stats = pd.read_csv(community_tool_path, sep="\t")

assert (
stats_column in community_tool_stats
), f"Stats column: {stats_column} not found in table!" # check if the stats column is there

# create the word cloud
frec = pd.Series(
community_tool_stats[stats_column].values, index=community_tool_stats["Galaxy wrapper id"]
).to_dict()

mask = np.array(Image.open(mask_figure))
mask[mask == 0] = 255 # set 0 in array to 255 to work with wordcloud

wc = WordCloud(
mask=mask,
background_color="rgba(255, 255, 255, 0)",
random_state=42,
)

wc.generate_from_frequencies(frec)

fig, ax = plt.subplots(figsize=(13, 5))
ax.imshow(wc)

plt.axis("off")
plt.tight_layout(pad=0)

plt.savefig(wordcloud_output_path)


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Create wordcloud from \
TSV file based on Galaxy usage statistics"
)
parser.add_argument(
"--table",
"-ta",
required=True,
help="Path to TSV file with tools and stats",
)
parser.add_argument(
"--stats_column",
"-sc",
required=True,
help="Name of the column with usage statistics",
)
parser.add_argument(
"--output",
"-out",
required=True,
help="Path to HTML output",
)

parser.add_argument("--wordcloud_mask", "-wcm", required=False, help="Mask figure to generate the wordcloud")

args = parser.parse_args()
get_wordcloud(args.table, args.wordcloud_mask, args.stats_column, args.output)
8 changes: 7 additions & 1 deletion bin/extract_all_tools_downstream.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,10 @@ mkdir -p 'results/'
python bin/create_interactive_table.py \
--table "results/all_tools.tsv" \
--template "data/interactive_table_template.html" \
--output "results/index.html"
--output "results/index.html"

python bin/create_wordcloud.py \
--table "results/all_tools.tsv" \
--wordcloud_mask "data/usage_stats/wordcloud_mask.png" \
--output "results/all_tools_wordcloud.png" \
--stats_column "https://usegalaxy.eu usage"
3 changes: 2 additions & 1 deletion bin/extract_galaxy_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ def add_tool_stats_to_tools(tools_df: pd.DataFrame, tool_stats_path: Path, colum
# group local and toolshed tools into one entry
grouped_tool_stats_tools = tool_stats_df.groupby("Galaxy wrapper id", as_index=False)["count"].sum()

community_tool_stats = pd.merge(grouped_tool_stats_tools, tools_df, on="Galaxy wrapper id")
# keep all rows of the tools table (how='right'), also for those where no stats are available
community_tool_stats = pd.merge(grouped_tool_stats_tools, tools_df, how="right", on="Galaxy wrapper id")
community_tool_stats.rename(columns={"count": column_name}, inplace=True)

return community_tool_stats
Expand Down
18 changes: 12 additions & 6 deletions bin/get_community_tools.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env bash


for com_data_fp in data/* ; do
for com_data_fp in data/communities/* ; do
if [[ -d "$com_data_fp" && ! -L "$com_data_fp" ]]; then
community=`basename "$com_data_fp"`

Expand All @@ -11,12 +11,12 @@ for com_data_fp in data/* ; do
curl \
-L \
"https://docs.google.com/spreadsheets/d/1Nq_g-CPc8t_eC4M1NAS9XFJDflA7yE3b9hfSg3zu9L4/export?format=tsv&gid=1533244711" \
-o "data/$community/tools_to_keep"
-o "data/communities/$community/tools_to_keep"

curl \
-L \
"https://docs.google.com/spreadsheets/d/1Nq_g-CPc8t_eC4M1NAS9XFJDflA7yE3b9hfSg3zu9L4/export?format=tsv&gid=672552331" \
-o "data/$community/tools_to_exclude"
-o "data/communities/$community/tools_to_exclude"
fi;


Expand All @@ -26,14 +26,20 @@ for com_data_fp in data/* ; do
filtertools \
--tools "results/all_tools.tsv" \
--filtered_tools "results/$community/tools.tsv" \
--categories "data/$community/categories" \
--exclude "data/$community/tools_to_exclude" \
--keep "data/$community/tools_to_keep"
--categories "data/communities/$community/categories" \
--exclude "data/communities/$community/tools_to_exclude" \
--keep "data/communities/$community/tools_to_keep"

python bin/create_interactive_table.py \
--table "results/$community/tools.tsv" \
--template "data/interactive_table_template.html" \
--output "results/$community/index.html"

python bin/create_wordcloud.py \
--table "results/$community/tools.tsv" \
--wordcloud_mask "data/usage_stats/wordcloud_mask.png" \
--output "results/$community/tools_wordcloud.png" \
--stats_column "https://usegalaxy.eu usage"

fi;
done
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Binary file added data/usage_stats/wordcloud_mask.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 5 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
pandas
PyGithub
pyyaml
pyyaml
numpy
Pillow
matplotlib
wordcloud
3 changes: 0 additions & 3 deletions results/all_tools.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ unzip 508 unzip Unzip file To update https://github.com/bmcv Convert Format
w4mclassfilter 3 w4mclassfilter Filter W4M data by values or metadata To update https://github.com/HegemanLab/w4mclassfilter_galaxy_wrapper Metabolomics w4mclassfilter eschen42 https://github.com/HegemanLab/w4mclassfilter_galaxy_wrapper/tree/master 0.98.19 r-base (0/1) (1/1) (1/1)
w4mcorcov 5 w4mcorcov OPLS-DA Contrasts of Univariate Results To update https://github.com/HegemanLab/w4mcorcov_galaxy_wrapper Metabolomics w4mcorcov eschen42 https://github.com/HegemanLab/w4mcorcov_galaxy_wrapper/tree/master 0.98.18 r-base (0/1) (1/1) (1/1)
w4mjoinpn 2 w4mjoinpn Join positive- and negative-mode W4M datasets To update https://github.com/HegemanLab/w4mjoinpn_galaxy_wrapper Metabolomics w4mjoinpn eschen42 https://github.com/HegemanLab/w4mjoinpn_galaxy_wrapper/tree/master 0.98.2 coreutils 8.25 (0/1) (1/1) (1/1)
Galaxy wrapper id https://usegalaxy.eu usage Galaxy tool ids Description bio.tool id bio.tool name bio.tool description EDAM operation EDAM topic Status Source ToolShed categories ToolShed id Galaxy wrapper owner Galaxy wrapper source Galaxy wrapper version Conda id Conda version https://usegalaxy.org https://usegalaxy.org.au https://usegalaxy.eu
10x_bamtofastq 46 10x_bamtofastq Converts 10x Genomics BAM to FASTQ Up-to-date https://github.com/10XGenomics/bamtofastq Convert Formats 10x_bamtofastq bgruening https://github.com/bgruening/galaxytools/tree/master/tools/10x_bamtofastq 1.4.1 10x_bamtofastq 1.4.1 (0/1) (0/1) (1/1)
add_line_to_file 193 add_line_to_file Adds a text line to the beginning or end of a file. To update Text Manipulation add_line_to_file bgruening https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/add_line_to_file 0.1.0 coreutils 8.25 (1/1) (1/1) (1/1)
agat 42 agat GTF/GFF analysis toolkit agat AGAT Another Gff Analysis Toolkit (AGAT)Suite of tools to handle gene annotations in any GTF/GFF format. Data handling, Genome annotation Genomics Up-to-date https://github.com/NBISweden/AGAT Convert Formats, Statistics, Fasta Manipulation agat bgruening https://github.com/bgruening/galaxytools/tree/master/tools/agat 1.2.0 agat 1.2.0 (0/1) (0/1) (1/1)
Expand Down Expand Up @@ -135,7 +134,6 @@ vcftools_slice 24 vcftools_slice Subset VCF dataset by genomic regions To u
vcftools_subset 17 vcftools_subset Select samples from a VCF dataset To update https://vcftools.github.io/ Variant Analysis vcftools_subset devteam https://github.com/galaxyproject/tools-devteam/tree/master/tool_collections/vcftools/vcftools_subset 0.1 tabix 1.11 (0/1) (0/1) (1/1)
venn_list 248 venn_list Draw Venn Diagram (PDF) from lists, FASTA files, etc To update https://github.com/peterjc/pico_galaxy/tree/master/tools/venn_list Graphics, Sequence Analysis, Visualization venn_list peterjc https://github.com/peterjc/pico_galaxy/tree/master/tools/venn_list 0.1.2 galaxy_sequence_utils 1.1.5 (1/1) (0/1) (1/1)
wtdbg 116 wtdbg WTDBG is a fuzzy Bruijn graph (FBG) approach to long noisy reads assembly. wtdbg2 wtdbg2 Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT). It assembles raw reads without error correction and then builds the consensus from intermediate assembly output. Wtdbg2 is able to assemble the human and even the 32Gb Axolotl genome at a speed tens of times faster than CANU and FALCON while producing contigs of comparable base accuracy. Genome assembly, De-novo assembly Sequence assembly, Sequencing Up-to-date https://github.com/ruanjue/wtdbg2 Assembly wtdbg bgruening https://github.com/bgruening/galaxytools/tree/master/tools/wtdbg 2.5 wtdbg 2.5 (0/1) (0/1) (1/1)
Galaxy wrapper id https://usegalaxy.eu usage Galaxy tool ids Description bio.tool id bio.tool name bio.tool description EDAM operation EDAM topic Status Source ToolShed categories ToolShed id Galaxy wrapper owner Galaxy wrapper source Galaxy wrapper version Conda id Conda version https://usegalaxy.org https://usegalaxy.org.au https://usegalaxy.eu
abricate 1257 abricate, abricate_list, abricate_summary Mass screening of contigs for antiobiotic resistance genes ABRicate ABRicate Mass screening of contigs for antimicrobial resistance or virulence genes. Antimicrobial resistance prediction Genomics, Microbiology Up-to-date https://github.com/tseemann/abricate Sequence Analysis abricate iuc https://github.com/galaxyproject/tools-iuc/tree/master/tools/abricate/ 1.0.1 abricate 1.0.1 (3/3) (3/3) (3/3)
adapter_removal 37 adapter_removal Removes residual adapter sequences from single-end (SE) or paired-end (PE) FASTQ reads. adapterremoval AdapterRemoval AdapterRemoval searches for and removes adapter sequences from High-Throughput Sequencing (HTS) data and (optionally) trims low quality bases from the 3' end of reads following adapter removal. AdapterRemoval can analyze both single end and paired end data, and can be used to merge overlapping paired-ended reads into (longer) consensus sequences. Additionally, AdapterRemoval can construct a consensus adapter sequence for paired-ended reads, if which this information is not available. Sequence trimming, Sequence merging, Primer removal Up-to-date https://github.com/MikkelSchubert/adapterremoval Fasta Manipulation, Sequence Analysis adapter_removal iuc https://github.com/galaxyproject/tools-iuc/tree/master/tools/adapter_removal/ 2.3.3 adapterremoval 2.3.3 (0/1) (0/1) (1/1)
aldex2 13 aldex2 Performs analysis Of differential abundance taking sample variation into account aldex2 ALDEx2 A differential abundance analysis for the comparison of two or more conditions. It uses a Dirichlet-multinomial model to infer abundance from counts, that has been optimized for three or more experimental replicates. Infers sampling variation and calculates the expected FDR given the biological and sampling variation using the Wilcox rank test and Welches t-test, or the glm and Kruskal Wallis tests. Reports both P and fdr values calculated by the Benjamini Hochberg correction. Statistical inference Gene expression, Statistics and probability To update https://github.com/ggloor/ALDEx_bioc Metagenomics aldex2 iuc https://github.com/galaxyproject/tools-iuc/tree/master/tools/aldex2 1.26.0 bioconductor-aldex2 1.34.0 (0/1) (0/1) (1/1)
Expand Down Expand Up @@ -452,7 +450,6 @@ winnowmap 27 winnowmap A long-read mapping tool optimized for mapping ONT and Pa
xpath 3 xpath XPath XML querying tool To update http://search.cpan.org/dist/XML-XPath/ Text Manipulation xpath iuc https://github.com/galaxyproject/tools-iuc/tree/master/tools/xpath perl-xml-xpath 1.47 (0/1) (0/1) (1/1)
yahs 64 yahs Yet Another Hi-C scaffolding tool Up-to-date https://github.com/c-zhou/yahs Assembly yahs iuc https://github.com/galaxyproject/tools-iuc/tree/master/tools/yahs 1.2a.2 yahs 1.2a.2 (1/1) (1/1) (1/1)
zerone 2 zerone ChIP-seq discretization and quality control Up-to-date https://github.com/nanakiksc/zerone ChIP-seq zerone iuc https://github.com/galaxyproject/tools-iuc/tree/master/tools/zerone 1.0 zerone 1.0 (0/1) (0/1) (1/1)
Galaxy wrapper id https://usegalaxy.eu usage Galaxy tool ids Description bio.tool id bio.tool name bio.tool description EDAM operation EDAM topic Status Source ToolShed categories ToolShed id Galaxy wrapper owner Galaxy wrapper source Galaxy wrapper version Conda id Conda version https://usegalaxy.org https://usegalaxy.org.au https://usegalaxy.eu
bed_to_protein_map 49 bed_to_protein_map Converts a BED file to a tabular list of exon locations To update Proteomics bed_to_protein_map galaxyp https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/bed_to_protein_map 0.2.0 python (1/1) (1/1) (1/1)
bigwig_to_bedgraph 200 bigwig_to_bedgraph Converts a bigWig file to bedGraph format To update http://artbio.fr Convert Formats bigwig_to_bedgraph artbio https://github.com/ARTbio/tools-artbio/tree/main/tools/bigwig_to_bedgraph 377+galaxy1 ucsc-bigwigtobedgraph 448 (0/1) (0/1) (1/1)
blast2go 101 blast2go Maps BLAST results to GO annotation terms To update https://github.com/peterjc/galaxy_blast/tree/master/tools/blast2go Ontology Manipulation, Sequence Analysis blast2go peterjc https://github.com/peterjc/galaxy_blast/tree/master/tools/blast2go 0.0.11 b2g4pipe (0/1) (0/1) (0/1)
Expand Down
Binary file added results/all_tools_wordcloud.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 56268bc

Please sign in to comment.