Why do I get different model `n_params` and `size` results for the same model in macOS and Linux? #10274

shakfu · 2024-11-13T05:56:06Z

shakfu
Nov 13, 2024

In tests for my cython wrapper of llama.cpp, I got weirdly different results for model size and n_params when running the same Llama-3.2-1B-Instruct-Q8_0.gguf model in macOS and Linux. I thought it was a bug in my wrapper code, until I got the same result in cpp:

#include "llama.h"
#include <iostream>
#include <cstdio>
#include <cstring>
#include <string>
#include <vector>


int main() {
    std::string model_path = "models/Llama-3.2-1B-Instruct-Q8_0.gguf";
    std::string prompt = "Is Mathematics invented or discovered?";
    // number of layers to offload to the GPU
    int ngl = 99;
    // number of tokens to predict
    int n_predict = 32;

    // initialize the model
    llama_model_params model_params = llama_model_default_params();
    model_params.n_gpu_layers = ngl;
    llama_model * model = llama_load_model_from_file(model_path.c_str(), model_params);

    // model properties
    uint64_t n_params = llama_model_n_params(model);
    uint64_t size = llama_model_size(model);

    std::cout << "model.n_params: " << n_params << std::endl;
    std::cout << "model.size: " << size << std::endl;
    std::cout.flush();

    llama_free_model(model);

    return 0;
}

With the results being:

macos model size == 1592336512
macos model n_params == 1498482720
linux model.size == 1313251456
linux model.n_params == 1235814432

What accounts for this difference?

Answered by slaren

Nov 13, 2024

The number of parameters and model size is calculated from the tensors allocated after the model is loaded, which in some cases when using some backends may contain some duplicated tensors (eg. when the model shares the same tensor for tok_embd and output), so the parameters of these tensors are counted twice. It is a bug, but it should be a simple fix to count only the tensors from the gguf file.

View full answer

slaren · 2024-11-13T13:31:11Z

slaren
Nov 13, 2024
Collaborator

The number of parameters and model size is calculated from the tensors allocated after the model is loaded, which in some cases when using some backends may contain some duplicated tensors (eg. when the model shares the same tensor for tok_embd and output), so the parameters of these tensors are counted twice. It is a bug, but it should be a simple fix to count only the tensors from the gguf file.

1 reply

shakfu Nov 13, 2024
Author

Thanks very much for the clarification!

FirstTimeEZ · 2024-11-14T07:48:27Z

FirstTimeEZ
Nov 14, 2024

I created a patch for this bug #10286

0 replies

shakfu · 2024-11-16T21:39:08Z

shakfu
Nov 16, 2024
Author

@FirstTimeEZ Thanks very much.

Looks like your PR was merged yesterday, I can confirm that this is no longer an issue! 👍

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do I get different model `n_params` and `size` results for the same model in macOS and Linux? #10274

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Why do I get different model n_params and size results for the same model in macOS and Linux? #10274

shakfu Nov 13, 2024

Replies: 3 comments · 1 reply

slaren Nov 13, 2024 Collaborator

shakfu Nov 13, 2024 Author

FirstTimeEZ Nov 14, 2024

shakfu Nov 16, 2024 Author

Why do I get different model `n_params` and `size` results for the same model in macOS and Linux? #10274

shakfu
Nov 13, 2024

Replies: 3 comments 1 reply

slaren
Nov 13, 2024
Collaborator

shakfu Nov 13, 2024
Author

FirstTimeEZ
Nov 14, 2024

shakfu
Nov 16, 2024
Author