This project processes documents and performs question-answering using OpenAI and Pinecone API.
document_processing.py
: The main script for loading a PDF, dividing its content, generating embeddings with OpenAI, and storing them in Pinecone.constants.py
: Holds the constants used across the repository.main.py
: A Streamlit application that enables querying of the embedded documents using a question-answering chain.
- Python 3+
- Pinecone API key
- OpenAI API key
- Streamlit
-
Install the Required Libraries:
-
Install the required libraries using the following command:
$ pip install -r requirements.txt
-
-
Set Up Configuration:
-
Replace the key in
config.py
with your key::OPENAI_API_KEY = 'YOUR_OPENAI_API_KEY' PINECONE_API_KEY = 'YOUR_PINECONE_API_KEY' PINECONE_API_ENVIRONMENT = 'YOUR_PINECONE_ENVIRONMENT'
-
-
Run
document_processing.py
:-
This will load the provided PDF, split its content, generate embeddings, and save them to Pinecone.
$ python document_processing.py
-
-
Start the Streamlit:
-
Use Streamlit to run the
main.py
script.$ streamlit run main.py
-
Once the application is running, you can enter questions related to the PDF content, and it will provide relevant answers using the created embeddings and the question-answering chain.
-