GitHub - CWALK19/youtube-ai-chatbot: ChatYT • "YouTubeGPT" • AI Chatbot for Marques Brownlee ⚡️🤖💬 • MKBHD AI

YouTube MKBHD AI Chatbot

YouTube MKBHD AI Chatbot ~ trained on 100+ videos from tech-gadget YouTuber Marques Brownlee @MKBHD

📝 About

Chat with 100+ YouTube videos from any creator in less than 10 minutes. This project combines basic Python scripting, vector embeddings, OpenAI, Pinecone, and Langchain into a modern chat interface, allowing you to quickly reference any content your favorite YouTuber covers. Type in natural language and get returned detailed answers: (1) in the style / tone of your YouTuber, and (2) with the top 2-3 specific videos referenced hyperlinked.

Example used in this repo is tech content creator Marques Brownlee, also known as MKBHD

(back to top)

💻 How to build

Note: macOS version, adjust accordingly for Windows / Linux

Initial setup

Clone and install dependencies:

git clone https://github.com/vdutts7/yt-ai-chat
cd yt-ai-chat
npm i

Copy .env.example and rename to .env in root directory. Fill out API keys:

ASSEMBLY_AI_API_TOKEN=""
OPENAI_API_KEY=""
PINECONE_API_KEY=""
PINECONE_ENVIRONMENT=""
PINECONE_INDEX=""

Get API keys:

AssemblyAI - ~ $3.50 per 100 vids
OpenAI
Pinecone

IMPORTANT: Verify that .gitignore contains .env in it.

Handle massive data

Outline:

Export metadata (.csv) of YouTube videos ⬇️
Download the audio files
Transcribe audio files

Navigate to scripts folder, which will host all of the data from the YouTube videos.

cd scripts

Setup python environemnt:

conda env list
conda activate youtube-chat
pip install -r requirements.txt

Scrape YouTube channel-- replace @mkbhd with channel of your choice. Replace 100 with the number of videos you want included (the script traverses backwards starting from most recent upload). A new file mkbhd.csv will be created at the directory as referenced below:

python scripts/scrape_vids.py https://www.youtube.com/@mkbhd 100 scripts/vid_list/mkbhd.csv

Refer to example_mkbhd.csv inside folder and verify your output matches this format:

Download audio files:

python scripts/download_yt_audios.py scripts/vid_list/mkbhd.csv scripts/audio_files/

We will utilize AssemblyAI's API wrapper class for OpenAI's Whisper API. Their script provides step-by-step directions for a more efficient, faster speech-to-text conversion as Whisper is way too slow and will cost you more. I spent ~ $3.50 to transcribe the 100 videos for MKBHD.

python scripts/transcribe_audios.py scripts/audio_files/ scripts/transcripts

Upsert to Pinecone database:

python scripts/pinecone_helper.py scripts/vid_list/mkbhd.csv scripts/transcripts/

Pinecone index setup I used below. I used P1 since this is optimized for speed. 1536 is OpenAI's standard we're limited to when querying data from the vectorstore:

Embeddings and database backend

Breaking down scripts/pinecone_helper.py :

Chunk size of 1000 characters with 500 character overlap. I found this working for me but obviously experiment and adjust according to your content library's size, complexity, etc.
Metadata: (1) video url and (2) video title

With Pinecone vectorstore loaded, we use Langchain's Conversational Retrieval QA to ask questions, extract relevant metadata from our embeddings, and deliver back to the user in a packaged format as an answer.

The relevant video titles are cited via hyperlinks directly to the video url.

Frontend UI with chat

NextJs styled with Tailwind CSS. src/pages/index.tsx contains base skeleton. src/pages/api/chat-chain.ts is heart of the code where the Langchain connections are outlined.

Run app

npm run dev

Go to http://localhost:3000. You should be able to type and ask questions now. Done ✅

🚀 Next steps

Deploy

I used Vercel as this was a relatively small project.

Alternatives: Heroku, Firebase, AWS Elastic Beanstalk, DigitalOcean, etc.

Customizations

UI/UX: change to your liking.

Bot behavior: edit prompt template in /src/pages/api/chat-chain.ts to fine-tune and add greater control on the bot's outputs.

Data: change URLs to handle whatever pages you want

(back to top)

🔧 Built With

(back to top)

👤 Contact

[email protected]

🔗 Project Link: https://github.com/vdutts7/yt-ai-chat

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
public		public
scripts		scripts
src		src
.DS_Store		.DS_Store
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
README.md		README.md
env.example		env.example
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.cjs		postcss.config.cjs
prettier.config.cjs		prettier.config.cjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube MKBHD AI Chatbot

Table of Contents

📝 About

💻 How to build

Initial setup

Handle massive data

Embeddings and database backend

Frontend UI with chat

Run app

🚀 Next steps

Deploy

Customizations

🔧 Built With

👤 Contact

About

Releases

Packages

Languages

CWALK19/youtube-ai-chatbot

Folders and files

Latest commit

History

Repository files navigation

YouTube MKBHD AI Chatbot

Table of Contents

📝 About

💻 How to build

Initial setup

Handle massive data

Embeddings and database backend

Frontend UI with chat

Run app

🚀 Next steps

Deploy

Customizations

🔧 Built With

👤 Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages