Skip to content

Commit

Permalink
docs(src/models): add symbolic link solution for windows
Browse files Browse the repository at this point in the history
docs(src/models): improve docs
fix(src/models): fix interpolation error when no constraints
fix(src/models): catch CP result parsing error from pyevalb
docs(improve README and remove legacy code files):
  • Loading branch information
Saibo Geng committed Oct 11, 2023
1 parent 9edad10 commit 69cf23f
Show file tree
Hide file tree
Showing 38 changed files with 105 additions and 1,797 deletions.
34 changes: 5 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,36 +25,12 @@ Install the required packages:
pip install -r requirements.txt
```

## 3. Downloading the dataset, grammar objects and models
## Experiments

check the [docs/download_data.md](docs/download_data.md) for instructions on how to download them.


## 4. Build task-specific grammars

c.f. [GF_helper repo](https://github.com/Saibo-creator/GF_helper)


## Running the experiments

```shell
# run the experiments for the CP task
bash run_CP.sh

# run the experiments for the IE task
bash run_IE.sh

# run the experiments for the ED task
bash run_ED.sh
```


The generated prediction sequences will be logged to [Weights and Biases](https://wandb.ai/site).


## Developer Guide

If you want to extend the codebase, please check the [docs/developer_guide.md](docs/developer_guide.md) for more details.
- [Download datasets, grammars and models](docs/download_data.md)
- [Build task-specific grammars](https://github.com/Saibo-creator/GF_helper)
- [Windows-specific setting](docs/windows.md)
- [Running the experiments](docs/run_experiments.md)


## Citation
Expand Down
38 changes: 0 additions & 38 deletions _legacy_load_datamodule.py

This file was deleted.

105 changes: 0 additions & 105 deletions _legacy_measure_generation_latency.py

This file was deleted.

2 changes: 1 addition & 1 deletion configs/hydra_conf/inference_root.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ logs_subfolder: inference
# determines the log directory's identifier
#run_name: ???

run_name: Task_${task}_Model_${model.name}_Datamodule_${datamodule.name}_Constraint_${model.gf_constraint_module.name}
run_name: Task_${task}_Model_${model.name}_Datamodule_${datamodule.name}_Constraint_${oc.select:model.gf_constraint_module.name,null}
6 changes: 0 additions & 6 deletions configs/hydra_conf/model/HFmodel_cp_old.yaml

This file was deleted.

36 changes: 0 additions & 36 deletions configs/hydra_conf/model/HFmodel_default_old.yaml

This file was deleted.

2 changes: 0 additions & 2 deletions configs/hydra_conf/model/HFmodel_ed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,4 @@ defaults:
- base_model
- HFmodel_default

#_target_: src.models.ELHFModelPL

_target_: src.models.ED_model.EDHFModelPL
4 changes: 0 additions & 4 deletions configs/hydra_conf/model/HFmodel_ed_old.yaml

This file was deleted.

1 change: 0 additions & 1 deletion configs/hydra_conf/model/HFmodel_ie.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,4 @@ defaults:

linearization_class_id: ${datamodule.linearization_class_id}


_target_: src.models.IE_model.IEHFModelPL
7 changes: 0 additions & 7 deletions configs/hydra_conf/model/HFmodel_ie_old.yaml

This file was deleted.

6 changes: 5 additions & 1 deletion docs/download_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@

## Download data for the experiments

At the root of the repository, run the following command to download the data files
```bash
git lfs install
git clone https://huggingface.co/datasets/saibo/GCD-data-v2
mv GCD-data-v2 data
```


Expand All @@ -18,7 +20,7 @@ git lfs install
git clone https://huggingface.co/datasets/saibo/GCD-grammar-v2 assets/pgf
```

Unzip the files
Unzip the compressed grammar files
```bash
cd assets/pgf
# unzip and remove the zip files
Expand Down Expand Up @@ -46,4 +48,6 @@ Then, we set the environment variable `HF_MODELS_DIR` to `~/models` by running t
export HF_MODELS_DIR=~/models
```

The models such as LLAMA-7B need to be in HuggingFace format.

We don't provide other model weights as they are too large and may have licensing issues.
73 changes: 73 additions & 0 deletions docs/run_experiments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Run experiments

## requirements

Check the env variable is set correctly
```shell
echo $HF_MODELS_DIR
```

Check the data and grammar objects are downloaded correctly
```shell
ls data assets/grammar_objects
# -> CP ED IE
```

Check the pre-trained models are downloaded correctly
```shell
ls assets/pgf
# -> CP ED IE
```

If anything is missing, check the [docs/download_data.md](docs/download_data.md) for instructions on how to set it.


## Run the experiments

### Quick start

Suppose you have already `LLAMA-7B` in `$HF_MODELS_DIR`, run the following commands:

```shell
# run the experiments for the CP task
bash run_CP.sh LLAMA-7B

# run the experiments for the IE task
bash run_IE.sh LLAMA-7B

# run the experiments for the ED task
bash run_ED.sh LLAMA-7B
```

The above scripts will run the experiments for the CP, IE and ED tasks respectively with a few data samples.
To run the experiments with the full dataset, please remove the `datamodule.debug_k=2` option in the scripts.

## Results

The generated prediction sequences will be logged to [Weights and Biases](https://wandb.ai/site).

## Dry run

If you don't have the model yet, you can run the experiments with a dummy model.
```shell
# run the experiments for the CP task
bash run_CP.sh saibo/llama-1B
```

`saibo/llama-1B` is a dummy model that has the same tokenizer as `LLAMA-7B` but with random weights.
It only has two layers so it's much smaller.
But as the model is randomly initialized, the results will be meaningless.







## Run experiments without constraints

You can check the results of the experiments without constraints by removing the constraints flags in the scripts.

For example, remove `+constraint/gf_constraint_module/[email protected]_constraint_module="$gf_constraint_module_option"` in `run_CP.sh` will run the experiments without constraints.


16 changes: 16 additions & 0 deletions docs/windows.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Windows Specific Setting

## Symbolic Link

This project uses symbolic links to point to the stable version of prompts used for each task.
While symbolic links are supported on Linux and MacOS, they are not supported on Windows.
If you are on Windows, the following files will be plain text files instead of symbolic links:
`assets/prompts/CP/stable`, `assets/prompts/ED/stable`, `assets/prompts/IE/stable`

They are plain text files that contain the path to the stable version of prompts used for each task.

You can manually copy the target directory to `assets/prompts/CP/stable`, `assets/prompts/ED/stable`, `assets/prompts/IE/stable` to make the code work.




2 changes: 1 addition & 1 deletion run_CP.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,5 @@ python run_inference.py \
+constraint/gf_constraint_module/[email protected]_constraint_module="$gf_constraint_module_option" \
model.pretrained_model_name_or_path="$HF_MODELS_DIR/$model" \
model.half_precision=false \
datamodule.debug_k=2 \
datamodule.debug_k=16 \
logger.wandb.offline=false
2 changes: 1 addition & 1 deletion run_ED.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ for ds in aquaint msnbc ace2004 wiki aida clueweb; do
datamodule="$datamodule_option" \
trainer="$trainer_option" \
model="$model_option" \
+constraint/gf_constraint_module/[email protected]_constraint_module=canonical_aida \
+constraint/gf_constraint_module/[email protected]_constraint_module=canonical \
model.pretrained_model_name_or_path="$HF_MODELS_DIR/$model" \
model.half_precision=false \
model.gf_constraint_module.grammar_module="$grammar_module" \
Expand Down
Loading

0 comments on commit 69cf23f

Please sign in to comment.