Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding values are different for the same model with the same parameters but different environments #1046

Open
2 of 5 tasks
NikhilVerma opened this issue Nov 21, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@NikhilVerma
Copy link
Contributor

NikhilVerma commented Nov 21, 2024

System Info

"@huggingface/transformers": "3.0.2"
Bun v1.1.30-canary.28+2f8c20ef8 (macOS arm64)
Chrome Version 131.0.6778.70 arm64

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

Take this piece of code

import { FeatureExtractionPipeline, pipeline } from "@huggingface/transformers";

let embeddingPipeline: Promise<FeatureExtractionPipeline> | undefined;

export async function getEmbedding(str: string) {
	if (!embeddingPipeline) {
		// eslint-disable-next-line no-console
		console.log(`🧠 Loading embedding pipeline`);

		embeddingPipeline = pipeline("feature-extraction", "mixedbread-ai/mxbai-embed-large-v1", {
			device: "auto",
			dtype: "q8"
		});
	}

	const resolvedPipeline = await embeddingPipeline;

	const output = await resolvedPipeline(str, {
		pooling: "cls",
		normalize: true
	});

	return output.tolist()[0] as number[];
}

When I run this code in the Browser and try to embed the same string "Hello world!" vs when I embed it in the browser (using WASM) I get entirely different values

Using bun

[
  -0.0021202601492404938, 0.0469636432826519, 0.020890265703201294, -0.022750232368707657,
  -0.037959884852170944, 0.003005226841196418, 0.053498536348342896, 0.022222621366381645,
  0.04570423811674118, 0.01894044317305088, 0.013796147890388966, 0.020720647647976875,
  0.009724173694849014, 0.013014012947678566, -0.05031043663620949, -0.000107266751001589,
  -0.014651484787464142, -0.024391934275627136, -0.051498446613550186, 0.012894446961581707,

Using browser

 [
    -0.016504084691405296,
    -0.04971511662006378,
    -0.0009000280988402665,
    -0.02497180737555027,
    -0.024496370926499367,
    -0.007153023034334183,
    -0.004980664700269699,
    0.021212700754404068,
    0.027969202026724815,
    0.004761920776218176,
    0.02262801118195057,
    -0.012614945881068707,
    0.02732115238904953,
    -0.04224494472146034,
    -0.004798870999366045,
    -0.028831878677010536,
    -0.04931287840008736,
    -0.034003954380750656,
    -0.07406899333000183,
    0.003017101436853409,
    0.005531534552574158,
    -0.005652727093547583,
    -0.05290239304304123,
    -0.043359968811273575,

Reproduction

  1. Embed a string using mixedbread-ai/mxbai-embed-large-v1 in console using Bun (Node should behave the same) (dtype resolves to cpu)
  2. Embed the same string in browser (dtype resolves to wasm)
  3. The values are different for the same piece of code

This makes any kind of RAG development impossible.

Update

  1. Switching between wasm and webgpu on the browser also gives entirely different results (different from cpu). It seems like the behaviour is quite different with different backends. I don't think this should be the case
@NikhilVerma NikhilVerma added the bug Something isn't working label Nov 21, 2024
@NikhilVerma
Copy link
Contributor Author

I have also tried ensuring that we use similar versions for both onnxruntime-web and node but the end-result is the same.

@NikhilVerma
Copy link
Contributor Author

OK i found the fix. I think we should document it. Basically forcing CPU (instead of WASM) gives the desired consistency.

                 embeddingPipeline = pipeline("feature-extraction", LOCAL_EMBEDDING_MODEL_NAME, {
			device: "auto",
			dtype: "q8",
			session_options: {
				executionProviders: ["cpu"]
			}
		});

It's not ideal ofcoure, but since I run this in a workerpool I don't notice too much degradation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant