-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 06acaad
Showing
66 changed files
with
8,014 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
{ | ||
"extends": "next/core-web-vitals" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files. | ||
|
||
# dependencies | ||
/node_modules | ||
/.pnp | ||
.pnp.js | ||
|
||
# testing | ||
/coverage | ||
|
||
# next.js | ||
/.next/ | ||
/out/ | ||
|
||
# production | ||
/build | ||
|
||
# misc | ||
.DS_Store | ||
*.pem | ||
|
||
# debug | ||
npm-debug.log* | ||
yarn-debug.log* | ||
yarn-error.log* | ||
.pnpm-debug.log* | ||
|
||
# local env files | ||
.env*.local | ||
.env | ||
docs.json | ||
embedding.json | ||
|
||
# vercel | ||
.vercel | ||
|
||
# typescript | ||
*.tsbuildinfo | ||
next-env.d.ts |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
{ | ||
"trailingComma": "all", | ||
"singleQuote": true, | ||
"printWidth": 80, | ||
"tabWidth": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,187 @@ | ||
# GPT-4 & LangChain - Create a ChatGPT Chatbot for Your PDF Files | ||
|
||
Use the new GPT-4 api to build a chatGPT chatbot for multiple Large PDF files. | ||
|
||
Tech stack used includes LangChain, Pinecone, Typescript, Openai, and Next.js. LangChain is a framework that makes it easier to build scalable AI/LLM apps and chatbots. Pinecone is a vectorstore for storing embeddings and your PDF in text to later retrieve similar docs. | ||
|
||
The visual guide of this repo and tutorial is in the `visual guide` folder. | ||
|
||
**If you run into errors, please review the troubleshooting section further down this page.** | ||
|
||
## Development | ||
|
||
1. Make sure you have installed node and yarn | ||
|
||
[Node installation](https://nodejs.org/en/download) | ||
|
||
Yarn installation in your terminal after installing node | ||
|
||
`npm install -g yarn` | ||
|
||
Check that both are installed. | ||
|
||
``` | ||
node -v | ||
yarn -v | ||
``` | ||
|
||
Node must be at least version 18.x.x | ||
|
||
Clone the repo | ||
|
||
2. Install packages | ||
|
||
``` | ||
yarn install | ||
``` | ||
|
||
You should see a `node_modules` folder afterwards. | ||
|
||
3. In the `config` folder, replace the `PINECONE_NAME_SPACE` with a `namespace` where you'd like to store your embeddings on Pinecone when you run `npm run ingest` manually or use the `api/ingest` via uploading on the frontend. This namespace will later be used for queries and retrieval. | ||
|
||
--- | ||
|
||
## If you want to "ingest" manually | ||
|
||
--- | ||
|
||
Set up your `.env` file and insert credentials | ||
|
||
- Copy `.env.example` into `.env` | ||
Your `.env` file should look like this: | ||
|
||
``` | ||
OPENAI_API_KEY= | ||
PINECONE_API_KEY= | ||
PINECONE_ENVIRONMENT= | ||
PINECONE_INDEX_NAME= | ||
``` | ||
|
||
- Visit [openai](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key) to retrieve API keys and insert into your `.env` file. | ||
- Visit [pinecone](https://pinecone.io/) to create and retrieve your API keys, and also retrieve your environment and index name from the dashboard. | ||
|
||
### Convert your PDF files to embeddings | ||
|
||
**This repo can load multiple PDF files** | ||
|
||
1. Inside `docs` folder, add your pdf files or folders that contain pdf files. | ||
|
||
2. Run the script `npm run ingest` to 'ingest' and embed your docs. If you run into errors troubleshoot below. | ||
|
||
3. Check Pinecone dashboard to verify your namespace and vectors have been added. | ||
|
||
**You can also manually ingest other file types by adding more loaders to the `DirectoryLoader`** | ||
|
||
### Chat with your docs | ||
|
||
Run `npm run dev` to load `localhost:3000`, then visit the `Chatbot` page to chat with your docs. | ||
|
||
--- | ||
|
||
## If you want to "ingest" via the UI upload | ||
|
||
--- | ||
|
||
If you would prefer to use the UI upload in `upload` page, you don't need to `.env` file. | ||
|
||
First, run `npm run dev` to load `localhost:3000`, then click on `Add credentials` to input your key credentials. Then click `Save.` | ||
|
||
Drag or upload a file into the upload area and then click `upload`. You should then be redirected to the chatbot. | ||
|
||
## Adapting for your use case | ||
|
||
In `utils/makechain.ts` chain change the `QA_PROMPT` prompt for your own usecase. Change `modelName` in `OpenAI` to `gpt-4`, if you have access `gpt-4` api. | ||
|
||
## Troubleshooting | ||
|
||
**General errors** | ||
|
||
- Make sure you're running the latest Node version. Run `node -v` | ||
- Make sure you're using the same versions of LangChain and Pinecone as this repo. | ||
- Check that you've created an `.env` file that contains your valid (and working) API keys, environment and index name. | ||
- If you change `modelName` in `OpenAI` note that you need access to `gpt-4` for it to work. | ||
- Make sure you have access to `gpt-4` if you decide to use it. Test your openAI keys outside the repo and make sure it works and that you have enough API credits. | ||
- Your pdf file is corrupted and cannot be parsed. | ||
|
||
**Pinecone errors** | ||
|
||
- Make sure your pinecone dashboard `environment` and `index` matches the one in the `pinecone.ts` and `.env` files. | ||
- Check that you've set the vector dimensions to `1536`. | ||
- Make sure your pinecone namespace is in lowercase. | ||
- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter. | ||
- Retry from scratch with a new Pinecone index and cloned repo. | ||
|
||
## Deployment | ||
|
||
## Key files | ||
|
||
`config/fileuploadconfig.ts`: Controls to the maxfilesize and maxnumberfiles allowed per upload. These settings are preconfigured for Vercel serveless function limits. | ||
|
||
`utils/extractTextFromFiles.ts`: handles the logic for 'loading' various file types. | ||
|
||
`utils/manualPDFLoader.ts`: this file is used for the manual ingest process run in `ingest-data.ts` | ||
|
||
`utils/customPDFLoader`: The PDF 'loader' that parses the uploaded files into LangChain `Documents`. Modify the `metadata` as required. | ||
|
||
`utils/formidable.ts`: Responsible for parsing uploading files. | ||
|
||
`utils/makechain.ts`: Logic responsible for combining question to standalone question, retrieving relevant docs and then outputting a final result. Change the `OpenAIChat` `modelName` to `gpt-3.5-turbo` if you don't have access to `gpt-4`. Modify the `QA_Prompt` for your use case. | ||
|
||
`utils/pinecone-client.ts`: The pinecone client that takes credentials from the UI. | ||
|
||
`utils/pinecone-local-client.ts`: The pinecone client that uses the credentials from the `.env` file. | ||
|
||
`api/ingest.ts`: Api route responsible for 'ingesting' the uploaded files. | ||
|
||
`api/ingest-url.ts`: Api route responsible for 'ingesting' uploaded url. | ||
|
||
`api/delete-namespace.ts`: Api route responsible for delete the specified namespace from the index. Use the `pinecone-local-client.ts` | ||
|
||
`api/chat.ts`: Api route responsible for the 'chat' process, including retrieval of relevant documents. | ||
|
||
`pages/credentials.tsx`: Main page for uploading credentials from the UI. | ||
|
||
`components/FileUploadArea.tsx`: The file upload drop area. Modify the accepted files here as well the number of files allowed and max file size. | ||
|
||
`public`: In the public folder you can change the default images of bot and user. Make sure to change the file names in the frontend `components/chat.tsx` as well: | ||
|
||
For example: | ||
|
||
``` | ||
<Image | ||
key={index} | ||
src="/bot-image.png" //change this to your new image name in public folder | ||
``` | ||
|
||
## Deployment | ||
|
||
**Please note that eslint and typescript errors are ignored in the `next.config.js` file by default. If you would like to throw errors during production build remove these configs** | ||
|
||
There are a couple of high-level options for deploying your app: | ||
|
||
a. | ||
Deploying to a VM or container | ||
Persistent filesystem means you can save and load files from disk | ||
Always-running process means you can cache some things in memory | ||
You can support long-running requests, such as WebSockets | ||
|
||
b. | ||
Deploying to a serverless environment | ||
No persistent filesystem means you can load files from disk, but not save them for later | ||
Cold start means you can't cache things in memory and expect them to be cached between requests | ||
Function timeouts mean you can't support long-running requests, such as WebSockets | ||
Some other considerations include: | ||
|
||
Options: | ||
|
||
- [Vercel](https://vercel.com/docs/concepts/deployments/overview) | ||
- [Fly.io](https://fly.io/) | ||
- [Render](https://render.com/docs/deploy-to-render) | ||
|
||
## Credits | ||
|
||
[chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs/tree/main) |
Oops, something went wrong.