-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
set up basic README, add more via mkdocs etc (#54)
* set up basics, add more via mkdocs etc * change coverage and pycov version to fix testing * started writing stuff * set up basics
- Loading branch information
Showing
13 changed files
with
2,497 additions
and
2,000 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,79 +1,35 @@ | ||
# MALCO | ||
# pheval.llm | ||
|
||
Multilingual Analysis of LLMs for Clinical Observations | ||
![Contributors](https://img.shields.io/github/contributors/monarch-initiative/pheval.llm?style=plastic) | ||
![Stars](https://img.shields.io/github/stars/monarch-initiative/pheval.llm) | ||
![Licence](https://img.shields.io/github/license/monarch-initiative/pheval.llm) | ||
![Issues](https://img.shields.io/github/issues/monarch-initiative/pheval.llm) | ||
|
||
Built using the PhEval runner template (see instructions below). | ||
## Evaluate LLMs' capability at performing differential diagnosis for rare genetic diseases through medical-vignette-like prompts created with [phenopacket2prompt](https://github.com/monarch-initiative/phenopacket2prompt). | ||
|
||
# Usage | ||
Let us start by documenting how to run the current version in a new folder. This has to be changed! | ||
```shell | ||
poetry install | ||
poetry shell | ||
mkdir myinputdirectory | ||
mkdir myoutputdirectory | ||
cp -r /path/to/promptdir myinputdirectory/ | ||
cp inputdir/config.yaml myinputdirectory | ||
pheval run -i myinputdirectory -r "malcorunner" -o myoutputdirectory -t tests | ||
``` | ||
### Description | ||
To systematically assess and evaluate an LLM's ability to perform differential diagnostics tasks, we employed prompts programatically created with [phenopacket2prompt](https://github.com/monarch-initiative/phenopacket2prompt), thereby avoiding any patient privacy issues. The original data are phenopackets located at [phenopacket-store](https://github.com/monarch-initiative/phenopacket-store/). A programmatic approach for scoring and grounding results is also developed, made possible thanks to the ontological structure of the [Mondo Disease Ontology](https://mondo.monarchinitiative.org/). | ||
|
||
Two main analyses are carried out: | ||
- A benchmark of some openAI GPT-models against a state of the art tool for differential diagnostics, [Exomiser](https://github.com/exomiser/Exomiser). The bottom line, Exomiser [clearly outperforms the LLMs](https://github.com/monarch-initiative/pheval.llm/blob/short_letter/notebooks/plot_exomiser_o1MINI_o1PREVIEW_4o.ipynb). | ||
- A comparison of gpt-4o's ability to carry out differential diagnosis when prompted in different languages. | ||
|
||
## Template Runner for PhEval | ||
Formerly MALCO, Multilingual Analysis of LLMs for Clinical Observations. | ||
Built using the [PhEval](https://github.com/monarch-initiative/pheval) runner template. | ||
|
||
This serves as a template repository designed for crafting a personalised PhEval runner. Presently, the runner executes a mock predictor found in `src/pheval_template/run/fake_predictor.py`. Nevertheless, the primary objective is to leverage this repository as a starting point to develop your own runner for your tool, allowing you to customise and override existing methods effortlessly, given that it already encompasses all the necessary setup for integration with PhEval. There are exemplary methods throughout the runner to provide an idea on how things could be implemented. | ||
|
||
## Installation | ||
# Usage | ||
Before starting a run take care of editing the [run parameters](inputdir/run_parameters.csv) as follows: | ||
- The first line contains a non-empty comma-separated list of (supported) language codes between double quotation marks in which one wishes to prompt. | ||
- The second line contains a non-empty comma-separated list of (supported) model names between double quotation marks which one wishes to prompt. | ||
- The third line contains two comma-separated binary entries, represented by 0 (false) and 1 (true). The first set to true runs the prompting and grounding, i.e. the run step, the second one executes the scoring and the rest of the analysis, i.e. the post processing step. | ||
|
||
```bash | ||
git clone https://github.com/yaseminbridges/pheval.template.git | ||
cd pheval.template | ||
At this point one can install and run the code by doing | ||
```shell | ||
poetry install | ||
poetry shell | ||
mkdir outputdirectory | ||
cp -r /path/to/promptdir inputdir/ | ||
pheval run -i inputdir -r "malcorunner" -o outputdirectory -t tests | ||
``` | ||
|
||
## Configuring a run with the template runner | ||
|
||
A `config.yaml` should be located in the input directory and formatted like so: | ||
|
||
```yaml | ||
tool: template | ||
tool_version: 1.0.0 | ||
variant_analysis: False | ||
gene_analysis: True | ||
disease_analysis: False | ||
tool_specific_configuration_options: | ||
``` | ||
The testdata directory should include the subdirectory named `phenopackets` - which should contain phenopackets. | ||
|
||
## Run command | ||
|
||
```bash | ||
pheval run --input-dir /path/to/input_dir \ | ||
--runner templatephevalrunner \ | ||
--output-dir /path/to/output_dir \ | ||
--testdata-dir /path/to/testdata_dir | ||
``` | ||
|
||
## Benchmark | ||
|
||
You can benchmark the run with the `pheval-utils benchmark` command: | ||
|
||
```bash | ||
pheval-utils benchmark --directory /path/to/output_directoy \ | ||
--phenopacket-dir /path/to/phenopacket_dir \ | ||
--output-prefix OUTPUT_PREFIX \ | ||
--gene-analysis \ | ||
--plot-type bar_cumulative | ||
``` | ||
|
||
The path provided to the `--directory` parameter should be the same as the one provided to the `--output-dir` in the `pheval run` command | ||
|
||
## Personalising to your own tool | ||
|
||
If overriding this template to create your own runner implementation. There are key files that should change to fit with your runner implementation. | ||
|
||
1. The name of the Runner class in `src/pheval_template/runner.py` should be changed. | ||
2. Once the name of the Runner class has been customised, line 15 in `pyproject.toml` should also be changed to match the class name, then run `poetry lock` and `poetry install` | ||
|
||
The runner you give on the CLI will then change to the name of the runner class. | ||
|
||
You should also remove the `src/pheval_template/run/fake_predictor.py` and implement the running of your own tool. Methods in the post-processing can also be altered to process your own tools output. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Scoring | ||
In order to fairly score clinically accurate diagnoses - considering we are only using phenotypic data - we needed to match the grounded answers by an LLM (or by Exomiser) to the correct result present in the phenopacket, consisting of an OMIM identifier. This is illustrated in the image below. | ||
![figure](images/mondo_grouping.png). | ||
|
||
# Statistics | ||
|
||
# More TBD |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Welcome to pheval.llm, formerly MALCO | ||
|
||
To systematically assess and evaluate an LLM's ability to perform differential diagnostics tasks, we employed prompts programatically created with [phenopacket2prompt](https://github.com/monarch-initiative/phenopacket2prompt), thereby avoiding any patient privacy issues. The original data are phenopackets located at [phenopacket-store](https://github.com/monarch-initiative/phenopacket-store/). A programmatic approach for scoring and grounding results is also developed, made possible thanks to the ontological structure of the [Mondo Disease Ontology](https://mondo.monarchinitiative.org/). | ||
|
||
Two main analyses are carried out: | ||
- A benchmark of some openAI GPT-models against a state of the art tool for differential diagnostics, [Exomiser](https://github.com/exomiser/Exomiser). The bottom line, Exomiser [clearly outperforms the LLMs](https://github.com/monarch-initiative/pheval.llm/blob/short_letter/notebooks/plot_exomiser_o1MINI_o1PREVIEW_4o.ipynb). | ||
- A comparison of gpt-4o's ability to carry out differential diagnosis when prompted in different languages. | ||
|
||
## Project layout | ||
The description of the steps we take are found in the figure below ![figure](images/ppkt2score.png). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
The first part of the code does: | ||
|
||
### Prepare step | ||
|
||
### Run step | ||
|
||
### Post process step |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
The grounding happens via | ||
|
||
::: src.malco.post_process.mondo_score_utils |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Grounding | ||
Since LLMs today, up to November 2024, show little ability to precisely and reliably return unique identifiers of some entity present in a database, we need to deal with this issue. In order to transform some human language disease name such as "cystic fibrosis" into its corresponding [OMIM identifier OMIM:219700](https://omim.org/entry/219700) we use the following approach: | ||
|
||
<!--- Add links to files as soon as they are merged---> | ||
1. First, we try exact lexical matching between the LLMs reply and the OMIM diseases label. | ||
2. Then we run [CurateGPT](https://github.com/monarch-initiative/curategpt) on the remaining ones that have not been grounded. | ||
|
||
We remark here that we ground to MONDO. | ||
|
||
# OntoGPT |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
"en" | ||
"gpt-4","gpt-3.5-turbo","gpt-4o","gpt-4-turbo" | ||
0,1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
Before starting a run take care of editing the [run parameters](inputdir/run_parameters.csv) as follows: | ||
|
||
- The first line contains a non-empty comma-separated list of (supported) language codes between double quotation marks in which one wishes to prompt. | ||
- The second line contains a non-empty comma-separated list of (supported) model names between double quotation marks which one wishes to prompt. | ||
- The third line contains two comma-separated binary entries, represented by 0 (false) and 1 (true). The first set to true runs the prompting and grounding, i.e. the run step, the second one executes the scoring and the rest of the analysis, i.e. the post processing step. | ||
|
||
At this point one can install and run the code by doing: | ||
```shell | ||
poetry install | ||
poetry shell | ||
mkdir outputdirectory | ||
cp -r /path/to/promptdir inputdir/ | ||
pheval run -i inputdir -r "malcorunner" -o outputdirectory -t tests | ||
``` | ||
|
||
As an example, the [input file](https://github.com/monarch-initiative/pheval.llm/tree/main/docs/run_parameters.csv) file will execute only the post_process block for English, prompting the models gpt-4, gpt-3.5-turbo, gpt-4o, and gpt-4-turbo. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
"en" | ||
"gpt-4","gpt-3.5-turbo","gpt-4o","gpt-4-turbo" | ||
0,1 |
Oops, something went wrong.