Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSDAE to GPL... Error on start #20

Open
junebug-junie opened this issue Sep 12, 2022 · 1 comment
Open

TSDAE to GPL... Error on start #20

junebug-junie opened this issue Sep 12, 2022 · 1 comment

Comments

@junebug-junie
Copy link

junebug-junie commented Sep 12, 2022

I'm trying to go from my trained TSDAE and then apply GPL... However, keep getting errors.

! export dataset="hs_resume_tsdae_gpl_mini" 
! python -m gpl.train \
    --path_to_generated_data "generated/$dataset" \
    --base_ckpt "/Users/cfeld/Desktop/dev/trajectory/finetuning/gpl/outputs/tsdae/MiniLM-L6-H384-uncased-model" \
    --gpl_score_function "dot" \
    --batch_size_gpl 34 \
    --gpl_steps 100 \
    --queries_per_passage 1 \
    --output_dir "output/$dataset" \
    --evaluation_data "./$dataset" \
    --evaluation_output "evaluation/$dataset" \
    --generator "BeIR/query-gen-msmarco-t5-base-v1" \
    --retrievers "msmarco-distilbert-base-v3" "msmarco-MiniLM-L-6-v3" \
    --retriever_score_functions "cos_sim" "cos_sim" \
    --cross_encoder "cross-encoder/ms-marco-MiniLM-L-6-v2" \
    --use_train_qrels

However, I'm getting this error:

2022-09-12 17:37:44 - Loading faiss.
2022-09-12 17:37:44 - Successfully loaded faiss.
/opt/homebrew/Caskroom/miniconda/base/envs/finetune_hs/lib/python3.9/runpy.py:127: RuntimeWarning: 'gpl.train' found in sys.modules after import of package 'gpl', but prior to execution of 'gpl.train'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
[2022-09-12 17:37:44] INFO [gpl.train.train:79] Corpus does not exist in generated/. Now clone the one from the evaluation path ./
[2022-09-12 17:37:44] WARNING [gpl.train.train:106] Found `qgen_prefix` is not None. By setting `use_train_qrels == True`, the `qgen_prefix` will not be used
[2022-09-12 17:37:44] INFO [gpl.train.train:113] Loading qrels and queries from labeled data under the path of `evaluation_data`
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/finetune_hs/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Caskroom/miniconda/base/envs/finetune_hs/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/finetune_hs/lib/python3.9/site-packages/gpl/train.py", line 250, in <module>
    train(**vars(args))
  File "/opt/homebrew/Caskroom/miniconda/base/envs/finetune_hs/lib/python3.9/site-packages/gpl/train.py", line 114, in train
    assert 'qrels' in os.listdir(evaluation_data) and 'queries.jsonl' in os.listdir(evaluation_data)
AssertionError

Perhaps my folder structure isn't quite right? I've tried all kinds of combos... Folder:
corpus.jsonl
evaluation
- corpus.jsonl
- hs_resume_tsdae_gpl_mini
-- corpus.jsonl
generated
- corpus.jsonl
- hs_resume_tsdae_gpl_mini
-- corpus.jsonl
hs_resume_tsdae_gpl_mini
- corpus.jsonl
output
- hs_resume_tsdae_gpl_mini

@junebug-junie
Copy link
Author

@kwang2049 might you have an example you could share of this end to end?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant