Update README.md

JetBrains-Research · Jun 5, 2024 · 891e950 · 891e950
1 parent 69d0708
commit 891e950
Showing 1 changed file with 7 additions and 14 deletions.
diff --git a/bug_localization/README.md b/bug_localization/README.md
@@ -1,18 +1,17 @@
 # Bug Localization
 
 This folder contains code for **Bug Localization** benchmark. Challenge: 
-given an issue with bug description, identify the files within the project that need to be modified
-to address the reported bug.
+given an issue with bug description and the repository code in the state where issue is reproducible, identify the files within the project that need to be modified to address the reported bug.
 
-We provide scripts for [data collection and processing](./src/data), [data exploratory analysis](./src/notebooks) as well as several [baselines implementations](./src/baselines) for the task solution.
+We provide scripts for [data collection and processing](./src/data), [data exploratory analysis](./src/notebooks) as well as several [baselines implementations](./src/baselines) for the task solution with [evaluation metrics calculation](./src/notebooks).
 ## 💾 Install dependencies
 We provide dependencies for pip dependency manager, so please run the following command to install all required packages:
 ```shell
 pip install -r requirements.txt
 ```
-Bug Localization task: given an issue with bug description, identify the files within the project that need to be modified to address the reported bug
 
-## 🤗 Load data
+## 🤗 Dataset
+
 All data is stored in [HuggingFace 🤗](JetBrains-Research/lca-bug-localization). It contains:
 
 * Dataset with bug localization data (with issue description, sha of repo with initial state and to the state after issue fixation).
@@ -39,21 +38,15 @@ You can access data using [datasets](https://huggingface.co/docs/datasets/en/ind
 
 
 * Archived repos (from which we can extract repo content on different stages and get diffs which contains bugs fixations).\
-They are stored in `.tar.gz` so you need to run script to load them and unzip:
-  1. Set `repos_path` in [config](configs/data/hf_data.yaml) to directory where you want to store repos
-  2. Run [load_data_from_hf.py](./src/load_data_from_hf.py) which will load all repos from HF and unzip them
 
-## ⚙️ Run Baseline
+## ⚙️ Baselines
 
 * Embedding-based
   * [TF-IDF](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer)
   * [GTE](https://huggingface.co/thenlper/gte-large)
   * [CodeT5](https://huggingface.co/Salesforce/codet5p-110m-embedding)
-  * [BM25](https://platform.openai.com/docs/models/gpt-3-5-turbo)
+  * [BM25]()
 
 * Name-based
   * [GPT3.5](https://platform.openai.com/docs/models/gpt-3-5-turbo)
-  * [GPT4](https://platform.openai.com/docs/models/gpt-3-5-turbo)
-  * [Cloud 2](https://platform.openai.com/docs/models/gpt-3-5-turbo)
-  * [CodeLLama](https://platform.openai.com/docs/models/gpt-3-5-turbo)
-  * [Mistral](https://platform.openai.com/docs/models/gpt-3-5-turbo)
+  * [GPT4](https://platform.openai.com/docs/models/gpt-4)