Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update stats #78

Closed
wants to merge 65 commits into from
Closed
Changes from 2 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
7b065a5
collect multiple bio.tools ids
paulzierep Mar 12, 2024
2c416c7
Merge pull request #2 from paulzierep/collect-multiple-entries-for-bi…
paulzierep Mar 12, 2024
39540e0
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
27a26a6
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
92204fe
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
1d64d30
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
7fc8e15
fetch all tools bot - step merge
invalid-email-address Mar 12, 2024
07e072f
fetch all tools bot - step filter
invalid-email-address Mar 12, 2024
65be0b3
unique IDs using a set
paulzierep Mar 12, 2024
770d62f
Merge pull request #3 from paulzierep/collect-multiple-entries-for-bi…
paulzierep Mar 12, 2024
8e4c152
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
bc366c9
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
e45242d
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
922d19d
fetch all tools bot - step fetch
invalid-email-address Mar 12, 2024
05e6280
fetch all tools bot - step merge
invalid-email-address Mar 12, 2024
5c96e05
fetch all tools bot - step filter
invalid-email-address Mar 12, 2024
3d3edea
compare tool_ids
paulzierep Mar 13, 2024
ea0499b
forgot numpy
paulzierep Mar 13, 2024
8ca74d3
mypy type
paulzierep Mar 13, 2024
5c5dd64
Merge pull request #5 from paulzierep/improved-stats-generation
paulzierep Mar 13, 2024
7d95919
fetch all tools bot - step fetch
invalid-email-address Mar 13, 2024
97c0808
fetch all tools bot - step fetch
invalid-email-address Mar 13, 2024
910a95f
fetch all tools bot - step fetch
invalid-email-address Mar 13, 2024
136016f
fetch all tools bot - step fetch
invalid-email-address Mar 13, 2024
3583806
Update extract_galaxy_tools.py
paulzierep Mar 14, 2024
c78bb38
Update extract_galaxy_tools.py
paulzierep Mar 14, 2024
7749357
update stats generation, linting and func works
paulzierep Mar 14, 2024
725a5dc
fix bio.tools parsing
paulzierep Mar 14, 2024
a4f5666
fetch all tools bot - step fetch
invalid-email-address Mar 14, 2024
fdecb33
fetch all tools bot - step fetch
invalid-email-address Mar 14, 2024
6518e14
fetch all tools bot - step fetch
invalid-email-address Mar 14, 2024
859e9b4
update server list
paulzierep Mar 14, 2024
0de4d1e
add france
paulzierep Mar 14, 2024
f8839eb
reintroduce warning
paulzierep Mar 14, 2024
5ed1176
fetch all tools bot - step fetch
invalid-email-address Mar 14, 2024
b139c9d
fetch all tools bot - step merge
invalid-email-address Mar 14, 2024
bceb24e
fetch all tools bot - step filter
invalid-email-address Mar 14, 2024
aa160c0
typing for mypy
paulzierep Mar 14, 2024
8669668
typing for mypy
paulzierep Mar 14, 2024
98312ad
isort linitng
paulzierep Mar 14, 2024
bffbc23
get all available tools on server
paulzierep Mar 14, 2024
dfec6be
Merge branch 'main' into update-stats
paulzierep Mar 14, 2024
d30da25
linting
paulzierep Mar 14, 2024
9b13613
mypy linting
paulzierep Mar 18, 2024
c7f1d95
mypy linting
paulzierep Mar 18, 2024
660ab9c
mypy linting
paulzierep Mar 18, 2024
6be6058
mypy linting
paulzierep Mar 18, 2024
50bd451
mypy linting
paulzierep Mar 18, 2024
c7acc06
mypy linting
paulzierep Mar 18, 2024
a16d3f5
mypy linting
paulzierep Mar 18, 2024
93db1e7
mypy linting
paulzierep Mar 18, 2024
773f361
mypy linting
paulzierep Mar 18, 2024
6a8db1d
mypy linting
paulzierep Mar 18, 2024
24899a6
mypy linting
paulzierep Mar 18, 2024
c680879
mypy linting
paulzierep Mar 18, 2024
7b2ed06
compute galaxy instances outside of main script
paulzierep Mar 18, 2024
b5023c6
linting
paulzierep Mar 18, 2024
22eb17e
linting
paulzierep Mar 18, 2024
97c5299
linting
paulzierep Mar 18, 2024
da19f3b
linting
paulzierep Mar 18, 2024
ecd9ec6
fix
paulzierep Mar 18, 2024
33a39f4
change name of Galaxy star servers
paulzierep Mar 25, 2024
fd11c1b
add column order for final table and change server availablity naming
paulzierep Mar 25, 2024
3ea51b9
rename wrapper id to suite id
paulzierep Mar 25, 2024
af02011
rename tool id to suite id
paulzierep Mar 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 53 additions & 11 deletions bin/extract_galaxy_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,45 @@
from github.ContentFile import ContentFile
from github.Repository import Repository

COLUMN_ORDER = [
"Galaxy wrapper id",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not call it "Galaxy suite id"?

"Galaxy tool ids",
"No. tools in the suite",
"Description",
"bio.tool id",
"bio.tool ids",
"bio.tool name",
"biii",
"bio.tool description",
"EDAM operation",
"EDAM topic",
"Conda id",
"Conda version",
"Galaxy wrapper version",
"Status",
"ToolShed categories",
"ToolShed id",
"Source",
"Galaxy wrapper owner",
"Galaxy wrapper source",
"Galaxy wrapper parsed folder",
"Galaxy Star Availability",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why having a separate column here?

"All Server Availability",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Public servers with at least one tool"

"Tools available on: UseGalaxy.org",
"Tools available on: UseGalaxy.org.au",
"Tools available on: UseGalaxy.eu",
"Tools available on: UseGalaxy.org.fr",
Comment on lines +50 to +53
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Tools available on: UseGalaxy.org",
"Tools available on: UseGalaxy.org.au",
"Tools available on: UseGalaxy.eu",
"Tools available on: UseGalaxy.org.fr",
"Tools available on UseGalaxy.org",
"Tools available on UseGalaxy.org.au",
"Tools available on UseGalaxy.eu",
"Tools available on UseGalaxy.fr",

"No. of tool users (2022-2023) (usegalaxy.eu)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"No. of tool users (2022-2023) (usegalaxy.eu)",
"Tool users in 2022-2023 on UseGalaxy.eu",

"Total tool usage (usegalaxy.eu)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Total tool usage (usegalaxy.eu)",
"Total tool usage on UseGalaxy.eu",

]


# Config variables
BIOTOOLS_API_URL = "https://bio.tools"
# BIOTOOLS_API_URL = "https://130.226.25.21"

# GALAXY_SERVER_URLS = [
# "https://usegalaxy.org",
# "https://usegalaxy.org.au",
# "https://usegalaxy.eu",
# "https://usegalaxy.fr",
# ]

GALAXY_SERVER_URLS = {
USEGALAXY_STAR_SERVER_URLS = {
"UseGalaxy.org": "https://usegalaxy.org",
"UseGalaxy.org.au": "https://usegalaxy.org.au",
"UseGalaxy.eu": "https://usegalaxy.eu",
Expand Down Expand Up @@ -632,7 +659,7 @@ def aggregate_servers(df: pd.DataFrame, server_names: list, column_name: str) ->

def extract_public_galaxy_servers_tools() -> Dict:
"""
Extract the tools from the public Galaxy servers using their API -> this is actually done
Extract the tools from the public Galaxy servers using their API -> this is actually done in
galaxy_tool_extractor/data/usage_stats/get_public_galaxy_servers.py
Here we only load the list -> much faster
TODO: run get_public_galaxy_servers.py as CI
Expand All @@ -651,6 +678,14 @@ def format_list_column(col: pd.Series) -> pd.Series:
return col.apply(lambda x: ", ".join(str(i) for i in x))


def order_output_columns(df: pd.DataFrame) -> pd.DataFrame:
"""
Reorder the columns based on best fitted output
"""
df = df.reindex(columns=COLUMN_ORDER)
return df


def export_tools(
tools: List[Dict], output_fp: str, format_list_col: bool = False, add_usage_stats: bool = False
) -> None:
Expand All @@ -673,12 +708,17 @@ def export_tools(
df["Galaxy tool ids"] = format_list_column(df["Galaxy tool ids"])

# add availability of star servers
df = add_instances_to_table(df, GALAXY_SERVER_URLS)
df = aggregate_servers(df, list(GALAXY_SERVER_URLS.keys()), column_name="Galaxy Star Availability")
df = add_instances_to_table(df, USEGALAXY_STAR_SERVER_URLS)
df = aggregate_servers(df, list(USEGALAXY_STAR_SERVER_URLS.keys()), column_name="Galaxy Star Availability")

# rename the the columns for each server
server_reindex_columns = {f"Tools available on: {k}": v for k, v in USEGALAXY_STAR_SERVER_URLS.items()}
df = df.rename(columns=server_reindex_columns)

print(df)

# add availability of all servers star servers
# only add the aggregated column

server_list = extract_public_galaxy_servers_tools()

df_selection = df.loc[:, ["Galaxy wrapper id", "Galaxy tool ids"]].copy()
Expand All @@ -689,6 +729,8 @@ def export_tools(
if add_usage_stats:
df = add_usage_stats_for_all_server(df)

df = order_output_columns(df)

df.to_csv(output_fp, sep="\t", index=False)


Expand Down