This project is example Chat-PDF, a web application that lets you chat with any PDF. This project uses LangChain for the prompt engineering, Gradio for the web user interface and Gradio to create the CLI for production.
This project use poetry, to install the dependencies, you can run the following command:
poetry install
To run the project, you can use the following command:
poetry run python src
├── src # Project source code
├── docs # Project documentation
│ └── static # README.md static files
├── tests # Folder containing software tests
│ ├── units # Unit tests
│ └── integrations # Integration tests
├── scripts # Useful scripts for the project (no CI/CD)
├── ruff.toml # Ruff configuration file
├── environment.yml # Conda environment configuration file
This project requires conda to be installed. To install the dependencies, simply run the following command:
conda env create -f environment.yml
You can update the environment with the following command:
conda env update -f environment.yml
This project uses typer
to create a command-line interface. To launch command help, simply issue the following command
the following command:
python src
This project have a docker-compose.yml
file to run the project in production. To run the project, you can use the
following command:
- build: Build the Docker image if not exists
- d: Mode daemon (run in background)
docker-compose up --build -d
If the build Docker timeout because miniconda install is too long, you can use the following command:
COMPOSE_HTTP_TIMEOUT=3600 sudo docker compose up --build
First, you must create the volume who allow Airflow to access to Gradio's 'db' folder contains the PDF content in vector
docker volume create shared_db_volume
To build the Docker image, you can run the following command:
docker build -t chat_pdf .
To build the Dockerfile, you can use the following command:
docker build -t gradio_chat_pdf .
To run the Docker container, you can use the following command:
d
: Mode daemonp
: Port to usev
: set volume for Airflow container (clear the db folder)name
: give name to container and don't have random name
docker run -d --name gradio_chat_pdf -v shared_db_volume:/app/db -p 7860:7860 gradio_chat_pdf
To build the Dockerfile, go to airflow
folder and use the following command:
docker build -t airflow_chat_pdf .
To run the Docker container, you can use the following command:
d
: Mode daemonp
: Port to usev
: set volume for Airflow container (clear the db folder)name
: give name to container and don't have random name
docker run -d --name airflow_chat_pdf -v shared_db_volume:/opt/airflow/db -p 8080:8080 airflow_chat_pdf
For debugging purposes, you can run the following command:
docker run -it -p 7860:7860 xxxx_chat_pdf /bin/bash
For deleting all Docker containers, you can use the following command:
docker rm -f $(docker ps -a -q)
For deleting all Docker images, you can use the following command:
docker rmi -f $(docker images -q)
This project uses Apache Airflow to schedule the data processing. To start the Airflow web server, you can run the following command:
airflow db init
airflow db migrate
Create an admin user:
airflow users create \
--username admin \
--firstname FIRST_NAME \
--lastname LAST_NAME \
--role Admin \
--email [email protected] \
--password admin
To start the Airflow web server in development, you can run the following command:
airflow standalone
To start the Airflow web server in production, you can run the following command:
airflow webserver --port 8080 --daemon
airflow scheduler --daemon