Skip to content

Commit

Permalink
hopefully solved conflict incorporating both
Browse files Browse the repository at this point in the history
  • Loading branch information
leokim-l committed Nov 19, 2024
2 parents 75419f9 + ca75c84 commit 1936578
Show file tree
Hide file tree
Showing 16 changed files with 334 additions and 563 deletions.
26 changes: 26 additions & 0 deletions .github/workflows/mkdocs-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: mkdocs-generation
on:
push:
branches:
- main
permissions:
contents: write
jobs:
build-docs:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v3

- name: Set up python 3
uses: actions/setup-python@v5
with:
python-version: 3.x

- name: Install Poetry
uses: snok/[email protected]

- name: Install dependencies
run: poetry install --no-interaction

- run: poetry run mkdocs gh-deploy --force
92 changes: 24 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,35 @@
# MALCO
# pheval.llm

Multilingual Analysis of LLMs for Clinical Observations
![Contributors](https://img.shields.io/github/contributors/monarch-initiative/pheval.llm?style=plastic)
![Stars](https://img.shields.io/github/stars/monarch-initiative/pheval.llm)
![Licence](https://img.shields.io/github/license/monarch-initiative/pheval.llm)
![Issues](https://img.shields.io/github/issues/monarch-initiative/pheval.llm)

Built using the PhEval runner template (see instructions below).
## Evaluate LLMs' capability at performing differential diagnosis for rare genetic diseases through medical-vignette-like prompts created with [phenopacket2prompt](https://github.com/monarch-initiative/phenopacket2prompt).

# Usage
Let us start by documenting how to run the current version in a new folder. This has to be changed!
```shell
poetry install
poetry shell
mkdir myinputdirectory
mkdir myoutputdirectory
cp -r /path/to/promptdir myinputdirectory/
cp inputdir/config.yaml myinputdirectory
pheval run -i myinputdirectory -r "malcorunner" -o myoutputdirectory -t tests
```
### Description
To systematically assess and evaluate an LLM's ability to perform differential diagnostics tasks, we employed prompts programatically created with [phenopacket2prompt](https://github.com/monarch-initiative/phenopacket2prompt), thereby avoiding any patient privacy issues. The original data are phenopackets located at [phenopacket-store](https://github.com/monarch-initiative/phenopacket-store/). A programmatic approach for scoring and grounding results is also developed, made possible thanks to the ontological structure of the [Mondo Disease Ontology](https://mondo.monarchinitiative.org/).

Two main analyses are carried out:
- A benchmark of some openAI GPT-models against a state of the art tool for differential diagnostics, [Exomiser](https://github.com/exomiser/Exomiser). The bottom line, Exomiser [clearly outperforms the LLMs](https://github.com/monarch-initiative/pheval.llm/blob/short_letter/notebooks/plot_exomiser_o1MINI_o1PREVIEW_4o.ipynb).
- A comparison of gpt-4o's ability to carry out differential diagnosis when prompted in different languages.

## Template Runner for PhEval
Formerly MALCO, Multilingual Analysis of LLMs for Clinical Observations.
Built using the [PhEval](https://github.com/monarch-initiative/pheval) runner template.

This serves as a template repository designed for crafting a personalised PhEval runner. Presently, the runner executes a mock predictor found in `src/pheval_template/run/fake_predictor.py`. Nevertheless, the primary objective is to leverage this repository as a starting point to develop your own runner for your tool, allowing you to customise and override existing methods effortlessly, given that it already encompasses all the necessary setup for integration with PhEval. There are exemplary methods throughout the runner to provide an idea on how things could be implemented.

## Installation
# Usage
Before starting a run take care of editing the [run parameters](inputdir/run_parameters.csv) as follows:
- The first line contains a non-empty comma-separated list of (supported) language codes between double quotation marks in which one wishes to prompt.
- The second line contains a non-empty comma-separated list of (supported) model names between double quotation marks which one wishes to prompt.
- The third line contains two comma-separated binary entries, represented by 0 (false) and 1 (true). The first set to true runs the prompting and grounding, i.e. the run step, the second one executes the scoring and the rest of the analysis, i.e. the post processing step.

```bash
git clone https://github.com/yaseminbridges/pheval.template.git
cd pheval.template
At this point one can install and run the code by doing
```shell
poetry install
poetry shell
mkdir outputdirectory
cp -r /path/to/promptdir inputdir/
pheval run -i inputdir -r "malcorunner" -o outputdirectory -t tests
```

## Configuring a run with the template runner

A `config.yaml` should be located in the input directory and formatted like so:

```yaml
tool: template
tool_version: 1.0.0
variant_analysis: False
gene_analysis: True
disease_analysis: False
tool_specific_configuration_options:
```
The testdata directory should include the subdirectory named `phenopackets` - which should contain phenopackets.

## Run command

```bash
pheval run --input-dir /path/to/input_dir \
--runner templatephevalrunner \
--output-dir /path/to/output_dir \
--testdata-dir /path/to/testdata_dir
```

## Benchmark

You can benchmark the run with the `pheval-utils benchmark` command:

```bash
pheval-utils benchmark --directory /path/to/output_directoy \
--phenopacket-dir /path/to/phenopacket_dir \
--output-prefix OUTPUT_PREFIX \
--gene-analysis \
--plot-type bar_cumulative
```

The path provided to the `--directory` parameter should be the same as the one provided to the `--output-dir` in the `pheval run` command

## Personalising to your own tool

If overriding this template to create your own runner implementation. There are key files that should change to fit with your runner implementation.

1. The name of the Runner class in `src/pheval_template/runner.py` should be changed.
2. Once the name of the Runner class has been customised, line 15 in `pyproject.toml` should also be changed to match the class name, then run `poetry lock` and `poetry install`

The runner you give on the CLI will then change to the name of the runner class.

You should also remove the `src/pheval_template/run/fake_predictor.py` and implement the running of your own tool. Methods in the post-processing can also be altered to process your own tools output.
239 changes: 0 additions & 239 deletions dev/ontoGPT_malco_postporocess.ipynb

This file was deleted.

7 changes: 7 additions & 0 deletions docs/analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Scoring
In order to fairly score clinically accurate diagnoses - considering we are only using phenotypic data - we needed to match the grounded answers by an LLM (or by Exomiser) to the correct result present in the phenopacket, consisting of an OMIM identifier. This is illustrated in the image below.
![figure](images/mondo_grouping.png).

# Statistics

# More TBD
Binary file added docs/images/mondo_grouping.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/ppkt2score.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Welcome to pheval.llm, formerly MALCO

To systematically assess and evaluate an LLM's ability to perform differential diagnostics tasks, we employed prompts programatically created with [phenopacket2prompt](https://github.com/monarch-initiative/phenopacket2prompt), thereby avoiding any patient privacy issues. The original data are phenopackets located at [phenopacket-store](https://github.com/monarch-initiative/phenopacket-store/). A programmatic approach for scoring and grounding results is also developed, made possible thanks to the ontological structure of the [Mondo Disease Ontology](https://mondo.monarchinitiative.org/).

Two main analyses are carried out:
- A benchmark of some openAI GPT-models against a state of the art tool for differential diagnostics, [Exomiser](https://github.com/exomiser/Exomiser). The bottom line, Exomiser [clearly outperforms the LLMs](https://github.com/monarch-initiative/pheval.llm/blob/short_letter/notebooks/plot_exomiser_o1MINI_o1PREVIEW_4o.ipynb).
- A comparison of gpt-4o's ability to carry out differential diagnosis when prompted in different languages.

## Project layout
The description of the steps we take are found in the figure below ![figure](images/ppkt2score.png).
Loading

0 comments on commit 1936578

Please sign in to comment.