Merge remote-tracking branch 'origin/main'

JetBrains-Research · Jun 6, 2024 · d7d7da2 · d7d7da2
2 parents 7bc9b23 + 4979a01
commit d7d7da2
Showing 1 changed file with 15 additions and 3 deletions.
diff --git a/ci-builds-repair/ci-builds-repair-benchmark/README.md b/ci-builds-repair/ci-builds-repair-benchmark/README.md
@@ -3,6 +3,15 @@
 
 This directory contains the code for the CI builds repair benchmark. 
 
+# How to
+
+## 💾 Install dependencies
+
+We provide dependencies for two Python dependencies managers: [pip](https://pip.pypa.io/en/stable/) and [Poetry](https://python-poetry.org/docs/). Poetry is preferred, `requirements.txt` is obtained by running `poetry export --with dev,eda --output requirements.txt`.
+
+* If you prefer pip, run `pip install -r requirements.txt`
+* If you prefer Poetry, run `poetry install`
+
 ## ⚙️ Config
 
 To initialize the benchmark, you need to pass a path to a config file with the following fields (see example in [`config_template.yaml`](config_template.yaml)):
@@ -11,12 +20,13 @@ To initialize the benchmark, you need to pass a path to a config file with the f
 `out_folder`: the path to where the result files will be stored;  
 `data_cache_dir`: the path to where the cached dataset will be stored;  
 `username_gh`: your GitHub username;  
-`test_username`: _Optional_. Username that would be displayed in the benchmark, if ommitted, `username_gh` will be used;  
+`test_username`: _Optional_. Username that would be displayed in the benchmark, if omitted, `username_gh` will be used;  
 `language`: dataset language (for now, only Python is available).  
 
 ## 🏟️ Benchmark usage
 
 **Important**: Before usage, please request to be added to the benchmark [organization]([https://huggingface.co/datasets/JetBrains-Research/lca-ci-builds-repair](https://github.com/orgs/LCA-CI-fix-benchmark) on Github to be able to push the repos for the test.  
+
 For the example of the benchmark usage code, see the [`run_benchmark.py`](run_benchmark.py) script.
 To use the benchmark, you need to pass a function `fix_repo_function` that fixes the build according to 
 the repository state on a local machine, logs, and the metadata of the failed workflows.
@@ -33,6 +43,8 @@ For now, only two functions have been implemented:
 `fix_none` —       does nothing;  
 `fix_apply_diff` — applies the diff that fixed the issue in the original repository;  
 
+You can download the dataset using the `CIFixBenchmark.get_dataset()` method.
+
 ## 🚀 Evaluate the baseline
 
 The method `CIFixBenchmark.eval_dataset(fix_repo_function)` evaluates the baseline function. Specifically, it:
@@ -48,12 +60,12 @@ For debugging, please limit yourself to a small number of datapoints (argument `
 
 The evaluation method outputs the following results:
 
-1. `jobs_ids.jsonl` — identifiers of the jobs that were sent to GitHub. They are used for the further evaluation.
+1. `jobs_ids.jsonl` — identifiers of the jobs that were sent to GitHub. They are used for further evaluation.
 2. `jobs_results.jsonl` — results of each job.
 3. `jobs_awaiting.jsonl` — list of awaiting jobs (normally should be empty).
 3. `jobs_invalid.jsonl` — list of invalid jobs (normally should be empty).
 
 Examples of these files can be found in the (`/examples`)[examples/] directory.
 
 You can also evaluate your results using the method `CIFixBenchmark.eval_jobs(result_filename=result_filename)`,
-passing the `jobs_ids.jsonl` file. You can download the dataset using the `CIFixBenchmark.get_dataset()` method.
+passing the `jobs_ids.jsonl` file.