Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about running inference #237

Open
SYuan03 opened this issue Aug 22, 2024 · 5 comments
Open

Question about running inference #237

SYuan03 opened this issue Aug 22, 2024 · 5 comments

Comments

@SYuan03
Copy link

SYuan03 commented Aug 22, 2024

Thank you for the work you've done—it's really awesome!

I'm trying to reproduce your results on some datasets and I'm noticing some differences in accuracy, I'm using AutoModelForCausalLM when loading the model and I'm not sure if this will have an impact on the accuracy?

model = AutoModelForCausalLM.from_pretrained(
            model_path,
            attn_implementation='flash_attention_2',
            torch_dtype=torch.half,
            trust_remote_code=True
        )
@LukeForeverYoung
Copy link
Collaborator

Hi, the accuracy of most large language models can be affected by the prompt. We will release the evaluation pipeline in the coming days for easy reproducibility of the results. For a specific task, can you provide more information about your evaluation process, such as the prompts and processors used?

@SYuan03
Copy link
Author

SYuan03 commented Aug 28, 2024

Hi, the accuracy of most large language models can be affected by the prompt. We will release the evaluation pipeline in the coming days for easy reproducibility of the results. For a specific task, can you provide more information about your evaluation process, such as the prompts and processors used?

Thanks for the reply. I'm a developer for the project VLMEvalKit. We're trying to add mPLUG-Owl3 into the project. I have submitted a PR to support the model, but the accuracy on e.g. the MMBench dataset fails to meet expectations. Here is my code.

We'd appreciate it if you'd mention PR yourself in support! But it will take you some time to read the developer's manual. So if it's convenient for you, you can see what's wrong with my code. Thanks a lot.

@chancharikmitra
Copy link

Side point regarding this question. Can we get a confirmation from the authors that what @SYuan03 showed is the proper way to load the model through huggingface? This is because the README does not use AutoModelforCausalLM to load the model for some reason. Would appreciate confirmation on that point.

@SYuan03
Copy link
Author

SYuan03 commented Aug 29, 2024

Side point regarding this question. Can we get a confirmation from the authors that what @SYuan03 showed is the proper way to load the model through huggingface? This is because the README does not use AutoModelforCausalLM to load the model for some reason. Would appreciate confirmation on that point.

Hi,thank you for your attention. I actually noticed this, and I tried loading the model two ways, but they were consistent in my tests.

@LukeForeverYoung
Copy link
Collaborator

Hi, the accuracy of most large language models can be affected by the prompt. We will release the evaluation pipeline in the coming days for easy reproducibility of the results. For a specific task, can you provide more information about your evaluation process, such as the prompts and processors used?

Thanks for the reply. I'm a developer for the project VLMEvalKit. We're trying to add mPLUG-Owl3 into the project. I have submitted a PR to support the model, but the accuracy on e.g. the MMBench dataset fails to meet expectations. Here is my code.

We'd appreciate it if you'd mention PR yourself in support! But it will take you some time to read the developer's manual. So if it's convenient for you, you can see what's wrong with my code. Thanks a lot.

Thank you for supporting our models! We have recently released the evaluation pipelines at this link to help you reproduce the evaluation results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants