-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about running inference #237
Comments
Hi, the accuracy of most large language models can be affected by the prompt. We will release the evaluation pipeline in the coming days for easy reproducibility of the results. For a specific task, can you provide more information about your evaluation process, such as the prompts and processors used? |
Thanks for the reply. I'm a developer for the project VLMEvalKit. We're trying to add mPLUG-Owl3 into the project. I have submitted a PR to support the model, but the accuracy on e.g. the MMBench dataset fails to meet expectations. Here is my code. We'd appreciate it if you'd mention PR yourself in support! But it will take you some time to read the developer's manual. So if it's convenient for you, you can see what's wrong with my code. Thanks a lot. |
Side point regarding this question. Can we get a confirmation from the authors that what @SYuan03 showed is the proper way to load the model through huggingface? This is because the README does not use |
Hi,thank you for your attention. I actually noticed this, and I tried loading the model two ways, but they were consistent in my tests. |
Thank you for supporting our models! We have recently released the evaluation pipelines at this link to help you reproduce the evaluation results. |
Thank you for the work you've done—it's really awesome!
I'm trying to reproduce your results on some datasets and I'm noticing some differences in accuracy, I'm using AutoModelForCausalLM when loading the model and I'm not sure if this will have an impact on the accuracy?
The text was updated successfully, but these errors were encountered: