Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix entity too large error #15

Open
gentlementlegen opened this issue Oct 25, 2024 · 5 comments
Open

Fix entity too large error #15

gentlementlegen opened this issue Oct 25, 2024 · 5 comments

Comments

@gentlementlegen
Copy link
Member

gentlementlegen commented Oct 25, 2024

          ```diff

! 413 Request Entity Too Large

<!--
{
  "error": {
    "status": 413,
    "headers": {
      "access-control-allow-credentials": "true",
      "access-control-allow-headers": "Authorization, User-Agent, X-Api-Key, X-CSRF-Token, X-Requested-With, Accept, Accept-Version, Content-Length, Content-MD5, Content-Type, Date, X-Api-Version, HTTP-Referer, X-Windowai-Title, X-Openrouter-Title, X-Title, X-Stainless-Lang, X-Stainless-Package-Version, X-Stainless-OS, X-Stainless-Arch, X-Stainless-Runtime, X-Stainless-Runtime-Version, X-Stainless-Retry-Count, Protection-Key",
      "access-control-allow-methods": "GET,OPTIONS,PATCH,DELETE,POST,PUT",
      "access-control-allow-origin": "*",
      "cache-control": "public, max-age=0, must-revalidate",
      "cf-cache-status": "DYNAMIC",
      "cf-ray": "8d7f62a83dbf2262-ORD",
      "connection": "keep-alive",
      "content-length": "65",
      "content-security-policy": "default-src 'self'; script-src 'self' 'unsafe-eval' 'unsafe-inline' https://clerk.openrouter.ai https://cunning-heron-18.clerk.accounts.dev https://challenges.cloudflare.com https://checkout.stripe.com https://connect-js.stripe.com https://js.stripe.com https://maps.googleapis.com https://www.googletagmanager.com https://*.ingest.sentry.io; connect-src 'self' https://clerk.openrouter.ai https://cunning-heron-18.clerk.accounts.dev https://checkout.stripe.com https://api.stripe.com https://maps.googleapis.com *.google-analytics.com https://www.googletagmanager.com https://raw.githubusercontent.com wss://www.walletlink.org/rpc https://*.ingest.sentry.io; frame-src 'self' https://challenges.cloudflare.com https://checkout.stripe.com https://connect-js.stripe.com https://js.stripe.com https://hooks.stripe.com https://us5.datadoghq.com https://*.ingest.sentry.io; img-src 'self' data: blob: https://img.clerk.com https://*.stripe.com https://www.googletagmanager.com https://t0.gstatic.com; worker-src 'self' blob:; style-src 'self' 'unsafe-inline' sha256-0hAheEzaMe6uXIKV4EehS9pu1am1lj/KnnzrOYqckXk=; upgrade-insecure-requests",
      "content-type": "application/json",
      "date": "Fri, 25 Oct 2024 04:15:26 GMT",
      "server": "cloudflare",
      "strict-transport-security": "max-age=63072000",
      "x-matched-path": "/api/v1/chat/completions",
      "x-vercel-error": "FUNCTION_PAYLOAD_TOO_LARGE",
      "x-vercel-id": "cle1::sh4cw-1729829725583-7b22e94157e6"
    },
    "error": {
      "code": "413",
      "message": "Request Entity Too Large"
    },
    "code": "413"
  },
  "stack": "Error: 413 Request Entity Too Large\n    at Function.generate (/home/runner/work/command-ask/command-ask/node_modules/openai/src/error.ts:101:12)\n    at OpenAI.makeStatusError (/home/runner/work/command-ask/command-ask/node_modules/openai/src/core.ts:424:21)\n    at OpenAI.makeRequest (/home/runner/work/command-ask/command-ask/node_modules/openai/src/core.ts:488:24)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at Completions.createCompletion (/home/runner/work/command-ask/command-ask/src/adapters/openai/helpers/completions.ts:31:57)\n    at askQuestion (/home/runner/work/command-ask/command-ask/src/handlers/ask-llm.ts:27:10)\n    at runPlugin (/home/runner/work/command-ask/command-ask/src/plugin.ts:59:22)\n    at run (/home/runner/work/command-ask/command-ask/src/main.ts:29:3)",
  "caller": "runPlugin"
}
-->

Originally posted by @ubiquity-os-beta[bot] in ubiquity-os-marketplace/text-conversation-rewards#163 (comment)

@sshivaditya2019 RFC

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 25, 2024

I just downloaded the action run logs to investigate.

  • it pulled from four sources. #159, #161 ,#163 and #23
  • Two issues, Two PRs. Two PR diffs. Four conversations. Four Specifications.
  • 41_315_392 characters selected in my IDE from the formatted chat log.

As a rough rule of thumb, 1 token is approximately 4 characters or 0.75 words for English text.

  • 41_315_392 / 4 = 10_329_848 tokens. I cannot copy/paste into tokenizer or here as my computer boots into vertical takeoff mode.

image


  1. We need to strip all .lock files from PR diffs as this is what does the most damage.
  2. We should implement tokenization logic before we send the request. The tokenizer should be compatible with the model the request is being made to.
  3. After these are implemented if the problem still exists then we need to restrict context fetch depth.

@gentlementlegen
Copy link
Member Author

Beyond the lock files the main issue were the dist compiled js files that are 15 mb of size and millions of characters that ended up in the diff. These should be somehow ignored.

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 25, 2024

Beyond the lock files the main issue were the dist compiled js files that are 15 mb of size and millions of characters that ended up in the diff. These should be somehow ignored.

ahh I didn't scan each line or review the PRs, I keyword searched sources of info and made assumptions as to what was in the PRs exactly but that's a good idea.

Removing sections from the diff is not a trivial task I have spent a couple hours on it but I could never get it perfect.

But effectively, without a solve for this plus unlimited fetch depth, this may become an often recurring error.

@0x4007
Copy link
Member

0x4007 commented Oct 26, 2024

Beyond the lock files the main issue were the dist compiled js files that are 15 mb of size and millions of characters that ended up in the diff. These should be somehow ignored.

Some standardized strategies

We can use .gitignore and also linguist.

I suppose for partner convenience we can try and intelligently exclude some things like reading the out directory in tsconfig and excluding?

Perhaps there is an open source tool who already figured this out for us.

@sshivaditya2019
Copy link
Collaborator

This issue should be resolved in Pull #21. I've created a new package that allows us to fetch diffs based on file paths and glob patterns, enabling us to filter diffs by the number of changes and file extensions as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants