Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update stats #78

Closed
wants to merge 65 commits into from
Closed
Changes from 2 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
7b065a5
collect multiple bio.tools ids
paulzierep Mar 12, 2024
2c416c7
Merge pull request #2 from paulzierep/collect-multiple-entries-for-bi…
paulzierep Mar 12, 2024
39540e0
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
27a26a6
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
92204fe
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
1d64d30
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
7fc8e15
fetch all tools bot - step merge
invalid-email-address Mar 12, 2024
07e072f
fetch all tools bot - step filter
invalid-email-address Mar 12, 2024
65be0b3
unique IDs using a set
paulzierep Mar 12, 2024
770d62f
Merge pull request #3 from paulzierep/collect-multiple-entries-for-bi…
paulzierep Mar 12, 2024
8e4c152
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
bc366c9
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
e45242d
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
922d19d
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
05e6280
fetch all tools bot - step merge
invalid-email-address Mar 12, 2024
5c96e05
fetch all tools bot - step filter
invalid-email-address Mar 12, 2024
3d3edea
compare tool_ids
paulzierep Mar 13, 2024
ea0499b
forgot numpy
paulzierep Mar 13, 2024
8ca74d3
mypy type
paulzierep Mar 13, 2024
5c5dd64
Merge pull request #5 from paulzierep/improved-stats-generation
paulzierep Mar 13, 2024
7d95919
fetch all tools bot - step fetch
invalid-email-address Mar 13, 2024
97c0808
fetch all tools bot - step fetch
invalid-email-address Mar 13, 2024
910a95f
fetch all tools bot - step fetch
invalid-email-address Mar 13, 2024
136016f
fetch all tools bot - step fetch
invalid-email-address Mar 13, 2024
3583806
Update extract_galaxy_tools.py
paulzierep Mar 14, 2024
c78bb38
Update extract_galaxy_tools.py
paulzierep Mar 14, 2024
7749357
update stats generation, linting and func works
paulzierep Mar 14, 2024
725a5dc
fix bio.tools parsing
paulzierep Mar 14, 2024
a4f5666
fetch all tools bot - step fetch
invalid-email-address Mar 14, 2024
fdecb33
fetch all tools bot - step fetch
invalid-email-address Mar 14, 2024
6518e14
fetch all tools bot - step fetch
invalid-email-address Mar 14, 2024
859e9b4
update server list
paulzierep Mar 14, 2024
0de4d1e
add france
paulzierep Mar 14, 2024
f8839eb
reintroduce warning
paulzierep Mar 14, 2024
5ed1176
fetch all tools bot - step fetch
invalid-email-address Mar 14, 2024
b139c9d
fetch all tools bot - step merge
invalid-email-address Mar 14, 2024
bceb24e
fetch all tools bot - step filter
invalid-email-address Mar 14, 2024
aa160c0
typing for mypy
paulzierep Mar 14, 2024
8669668
typing for mypy
paulzierep Mar 14, 2024
98312ad
isort linitng
paulzierep Mar 14, 2024
bffbc23
get all available tools on server
paulzierep Mar 14, 2024
dfec6be
Merge branch 'main' into update-stats
paulzierep Mar 14, 2024
d30da25
linting
paulzierep Mar 14, 2024
9b13613
mypy linting
paulzierep Mar 18, 2024
c7f1d95
mypy linting
paulzierep Mar 18, 2024
660ab9c
mypy linting
paulzierep Mar 18, 2024
6be6058
mypy linting
paulzierep Mar 18, 2024
50bd451
mypy linting
paulzierep Mar 18, 2024
c7acc06
mypy linting
paulzierep Mar 18, 2024
a16d3f5
mypy linting
paulzierep Mar 18, 2024
93db1e7
mypy linting
paulzierep Mar 18, 2024
773f361
mypy linting
paulzierep Mar 18, 2024
6a8db1d
mypy linting
paulzierep Mar 18, 2024
24899a6
mypy linting
paulzierep Mar 18, 2024
c680879
mypy linting
paulzierep Mar 18, 2024
7b2ed06
compute galaxy instances outside of main script
paulzierep Mar 18, 2024
b5023c6
linting
paulzierep Mar 18, 2024
22eb17e
linting
paulzierep Mar 18, 2024
97c5299
linting
paulzierep Mar 18, 2024
da19f3b
linting
paulzierep Mar 18, 2024
ecd9ec6
fix
paulzierep Mar 18, 2024
33a39f4
change name of Galaxy star servers
paulzierep Mar 25, 2024
fd11c1b
add column order for final table and change server availablity naming
paulzierep Mar 25, 2024
3ea51b9
rename wrapper id to suite id
paulzierep Mar 25, 2024
af02011
rename tool id to suite id
paulzierep Mar 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 26 additions & 9 deletions bin/extract_galaxy_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
Optional,
)

import numpy as np
import pandas as pd
import requests
import yaml
Expand All @@ -29,7 +30,6 @@
"https://usegalaxy.org",
"https://usegalaxy.org.au",
"https://usegalaxy.eu",
"https://usegalaxy.fr",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a list somewhere with public servers

]

project_path = Path(__file__).resolve().parent.parent # galaxy_tool_extractor folder
Expand All @@ -48,15 +48,15 @@

def get_last_url_position(toot_id: str) -> str:
"""
Returns the second last url position of the toot_id, if the value is not a
Returns the last url position of the toot_id, if the value is not a
url it returns the toot_id. So works for local and toolshed
installed tools.

:param tool_id: galaxy tool id
"""

if "/" in toot_id:
toot_id = toot_id.split("/")[-2]
toot_id = toot_id.split("/")[-1]
return toot_id


Expand All @@ -70,7 +70,6 @@ def add_tool_stats_to_tools(tools_df: pd.DataFrame, tool_stats_path: Path, colum
:param tools_path: path to the table with
the tools (csv,
must include "Galaxy wrapper id")
:param output_path: path to store the new table
:param column_name: column to add for the tool stats,
different columns could be added for the main servers
"""
Expand All @@ -82,13 +81,31 @@ def add_tool_stats_to_tools(tools_df: pd.DataFrame, tool_stats_path: Path, colum
tool_stats_df["Galaxy wrapper id"] = tool_stats_df["tool_name"].apply(get_last_url_position)

# group local and toolshed tools into one entry
grouped_tool_stats_tools = tool_stats_df.groupby("Galaxy wrapper id", as_index=False)["count"].sum()
# also group tools with different versions
grouped_tool_stats_tools = tool_stats_df.groupby("Galaxy wrapper id")["count"].sum()

# new column to store the stats
tools_df[column_name] = np.NaN

# check for each tool_id if a count exists in the stats file
# and sum the stats for each suite
for row_index, row in tools_df.iterrows():
counts = []
if isinstance(row["Galaxy tool ids"], str):
for tool_id in row["Galaxy tool ids"].split(","):
tool_id = tool_id.strip()
if tool_id in grouped_tool_stats_tools:
count = grouped_tool_stats_tools[tool_id]
counts.append(count)

if len(counts) == 0:
summed_count = np.NaN
else:
summed_count = sum(counts)

# keep all rows of the tools table (how='right'), also for those where no stats are available
community_tool_stats = pd.merge(grouped_tool_stats_tools, tools_df, how="right", on="Galaxy wrapper id")
community_tool_stats.rename(columns={"count": column_name}, inplace=True)
tools_df.loc[pd.Index([row_index]), column_name] = summed_count

return community_tool_stats
return tools_df


def add_usage_stats_for_all_server(tools_df: pd.DataFrame) -> pd.DataFrame:
Expand Down