Too low accuracy result compared with the expected result #52

xtchon · 2023-11-10T03:46:54Z

Hi, thanks for your work.
I'm trying to test out the result of your work but found some difficulties on reproducing similar accuracy results.

Below is the Environment that I created:
channels:

default
dependencies:
python=3.9.7
pip
pip:
- transformers==4.17.0
- scipy==1.7.3
- datasets==2.00.0
- scikit-learn==1.0.2
- torch==1.10.2
- black
- wandb
- matplotlib

I used datasets==2.00.0, cause when I install datasets==1.14.0, it would result the following conflict:
The conflict is caused by:
transformers 4.17.0 depends on huggingface-hub<1.0 and >=0.1.0
datasets 1.14.0 depends on huggingface-hub<0.1.0 and >=0.0.19

If I use datasets 2.00.0, it is able to run the evaluation.py MNLI ../CoFi-MNLI-s95, but the results seems wrong?
What can I do to solve this problem? Thanks a lot!

../CoFi-MNLI-s95 is what is downloaded from https://huggingface.co/princeton-nlp/CoFi-MNLI-s95
Results I obtained:
Task: mnli
Model path: ../CoFi-MNLI-s95
Model size: 4330279
Sparsity: 0.949
accuracy: 0.091
seconds/example: 0.000531

Too low accuracy compared to the expected result:
Task: MNLI
Model path: princeton-nlp/CoFi-MNLI-s95
Model size: 4920106
Sparsity: 0.943
mnli/acc: 0.8055
seconds/example: 0.010151

xiamengzhou · 2023-11-14T00:49:28Z

I think using datasets==1.14.0 is necessary in this case to get the model performance right. Maybe you can skip the version conflict for now?

SHUSHENGQIGUI · 2024-06-26T07:25:59Z

Hi, thanks for your work. I'm trying to test out the result of your work but found some difficulties on reproducing similar accuracy results.

Below is the Environment that I created: channels:

default
dependencies:

python=3.9.7

pip

pip:

transformers==4.17.0

scipy==1.7.3

datasets==2.00.0

scikit-learn==1.0.2

torch==1.10.2

black

wandb

matplotlib

I used datasets==2.00.0, cause when I install datasets==1.14.0, it would result the following conflict: The conflict is caused by: transformers 4.17.0 depends on huggingface-hub<1.0 and >=0.1.0 datasets 1.14.0 depends on huggingface-hub<0.1.0 and >=0.0.19

If I use datasets 2.00.0, it is able to run the evaluation.py MNLI ../CoFi-MNLI-s95, but the results seems wrong? What can I do to solve this problem? Thanks a lot!

../CoFi-MNLI-s95 is what is downloaded from https://huggingface.co/princeton-nlp/CoFi-MNLI-s95 Results I obtained: Task: mnli Model path: ../CoFi-MNLI-s95 Model size: 4330279 Sparsity: 0.949 accuracy: 0.091 seconds/example: 0.000531

Too low accuracy compared to the expected result: Task: MNLI Model path: princeton-nlp/CoFi-MNLI-s95 Model size: 4920106 Sparsity: 0.943 mnli/acc: 0.8055 seconds/example: 0.010151

hello did you solve the problem? i meet the same problem

xtchon · 2024-06-29T06:10:23Z

Hi, thanks for your work. I'm trying to test out the result of your work but found some difficulties on reproducing similar accuracy results.
Below is the Environment that I created: channels:

default
dependencies:

python=3.9.7

pip

pip:

transformers==4.17.0

scipy==1.7.3

datasets==2.00.0

scikit-learn==1.0.2

torch==1.10.2

black

wandb

matplotlib

I used datasets==2.00.0, cause when I install datasets==1.14.0, it would result the following conflict: The conflict is caused by: transformers 4.17.0 depends on huggingface-hub<1.0 and >=0.1.0 datasets 1.14.0 depends on huggingface-hub<0.1.0 and >=0.0.19
If I use datasets 2.00.0, it is able to run the evaluation.py MNLI ../CoFi-MNLI-s95, but the results seems wrong? What can I do to solve this problem? Thanks a lot!
../CoFi-MNLI-s95 is what is downloaded from https://huggingface.co/princeton-nlp/CoFi-MNLI-s95 Results I obtained: Task: mnli Model path: ../CoFi-MNLI-s95 Model size: 4330279 Sparsity: 0.949 accuracy: 0.091 seconds/example: 0.000531
Too low accuracy compared to the expected result: Task: MNLI Model path: princeton-nlp/CoFi-MNLI-s95 Model size: 4920106 Sparsity: 0.943 mnli/acc: 0.8055 seconds/example: 0.010151

hello did you solve the problem? i meet the same problem

Actually, datasets==1.14.0 is not necessary, use datasets==2.14.6 solves this problem. Afterwards there should be another issues and some code is needed to adjust manually.

SHUSHENGQIGUI · 2024-07-01T05:52:42Z

Hi, thanks for your work. I'm trying to test out the result of your work but found some difficulties on reproducing similar accuracy results.
Below is the Environment that I created: channels:

default
dependencies:

python=3.9.7

pip

pip:

transformers==4.17.0

scipy==1.7.3

datasets==2.00.0

scikit-learn==1.0.2

torch==1.10.2

black

wandb

matplotlib

I used datasets==2.00.0, cause when I install datasets==1.14.0, it would result the following conflict: The conflict is caused by: transformers 4.17.0 depends on huggingface-hub<1.0 and >=0.1.0 datasets 1.14.0 depends on huggingface-hub<0.1.0 and >=0.0.19
If I use datasets 2.00.0, it is able to run the evaluation.py MNLI ../CoFi-MNLI-s95, but the results seems wrong? What can I do to solve this problem? Thanks a lot!
../CoFi-MNLI-s95 is what is downloaded from https://huggingface.co/princeton-nlp/CoFi-MNLI-s95 Results I obtained: Task: mnli Model path: ../CoFi-MNLI-s95 Model size: 4330279 Sparsity: 0.949 accuracy: 0.091 seconds/example: 0.000531
Too low accuracy compared to the expected result: Task: MNLI Model path: princeton-nlp/CoFi-MNLI-s95 Model size: 4920106 Sparsity: 0.943 mnli/acc: 0.8055 seconds/example: 0.010151

hello did you solve the problem? i meet the same problem

Actually, datasets==1.14.0 is not necessary, use datasets==2.14.6 solves this problem. Afterwards there should be another issues and some code is needed to adjust manually.

Thank you. Where is the issue occurring, and which code needs to be modified?

SHUSHENGQIGUI · 2024-07-01T07:13:00Z

I think using datasets==1.14.0 is necessary in this case to get the model performance right. Maybe you can skip the version conflict for now?

hi. when i set transfomers=4.17.0 ,datasets=1.14.0, what version of huggingface-hub should be? there occurs version conflict from huggingface-hub ?

SHUSHENGQIGUI · 2024-07-02T02:55:15Z

ALL RIGHT. I finally find the key of this problem: I test the princeton-nlp/CoFi-MRPC-s95, the result is the same as the ReadMe table mentioned.

by the way, here is my setting: transfomers==4.17.0, datasets==2.1.0, huggingface-hub==0.19.0
So, I guess there are some bugs in evaluation.py to evaluate MNLI accuracy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too low accuracy result compared with the expected result #52

Too low accuracy result compared with the expected result #52

xtchon commented Nov 10, 2023 •

edited

Loading

xiamengzhou commented Nov 14, 2023

SHUSHENGQIGUI commented Jun 26, 2024

xtchon commented Jun 29, 2024

SHUSHENGQIGUI commented Jul 1, 2024

SHUSHENGQIGUI commented Jul 1, 2024

SHUSHENGQIGUI commented Jul 2, 2024 •

edited

Loading

Too low accuracy result compared with the expected result #52

Too low accuracy result compared with the expected result #52

Comments

xtchon commented Nov 10, 2023 • edited Loading

xiamengzhou commented Nov 14, 2023

SHUSHENGQIGUI commented Jun 26, 2024

xtchon commented Jun 29, 2024

SHUSHENGQIGUI commented Jul 1, 2024

SHUSHENGQIGUI commented Jul 1, 2024

SHUSHENGQIGUI commented Jul 2, 2024 • edited Loading

xtchon commented Nov 10, 2023 •

edited

Loading

SHUSHENGQIGUI commented Jul 2, 2024 •

edited

Loading