This Q&A bot will allow you to search through youtube transcripts using natural language! By going through this notebook, we'll introduce how you can use LanceDB to store and manage your data easily. Colab walkthrough -
Run the script
python main.py --query "what is a vectordb?"
default query = Which training method should I use for sentence transformers when I only have pairs of related sentences?
Argument | Default Value | Description |
---|---|---|
query | "Which training ..." | query to search |
context-length | 3 |
Number of queries to use as context |
window-size | 20 |
window size |
stride | 4 |
stride |
openai-key | OpenAI API Key, not required if OPENAI_API_KEY env var is set |
|
model | text-embedding-ada-002 |
OpenAI model to use |
Run the script
wget -c https://eto-public.s3.us-west-2.amazonaws.com/datasets/youtube_transcript/youtube-transcriptions_sample.jsonl
OPENAI_API_KEY=... node index.js