Skip to content

This is a LLM RAG system (built as a part of LLM Zoomcamp) designed to answer readers' questions on music production, sound design, post-production, and more.

Notifications You must be signed in to change notification settings

taltaf913/AI-Powered-Knowledge-Assistant-for-Audio-Prod

 
 

Repository files navigation

AI-Powered Knowledge Assistant for Audio Production

Arsonor is a blog with french content dedicated to home-studio audio production, aimed at creators of music, podcasts, videos, films and video games.

It covers essential topics such as sound engineering, sound design, post-production and home-studio optimization. You will find practical advice on improving sound quality, managing audio levels and using tools such as VST synths and mixing software. The goal is to make home sound creation accessible, by providing techniques and tips for a professional result.

This project introduces an innovative LLM RAG (Retrieval-Augmented Generation) system designed to answer readers' questions on music production, sound design, post-production, and more.

This system enhances the user experience by delivering accurate, context-aware responses to queries, drawing from the blog's extensive content library, making it easier for creators to find the information they need.

This project was implemented for LLM Zoomcamp - a free course about LLMs and RAG.

Table of contents

Project overview
Dataset
Technologies used in this project
Preparation (API Key and installing dependencies)
Running the application
Using the application
Details on the code
Notebooks and rag-system evaluations
Monitoring
Background
Acknowledgements

Project overview

The Arsonor-LLM-RAG system integrates an AI-powered model capable of understanding user queries and retrieving relevant content from Arsonor’s knowledge base. This approach combines the flexibility of language models with the accuracy of focused content retrieval, ensuring that users get precise, helpful responses.

Examples of Use:

  1. User Question: "What is a VCO and how does it work in a synthesizer?"
  • LLM RAG Response: Detailed explanation of VCOs with relevant information on their modulation and sound generation.
  1. User Question: "How do I reduce background noise in my podcast recordings?"
  • LLM RAG Response: Techniques for noise reduction using EQ, gating, and software solutions, linking directly to Arsonor tutorials.
  1. User Question: "What equipment do I need for a home studio?"
  • LLM RAG Response: Tailored suggestions based on the user’s goals, from budget setups to professional configurations, with in-depth guides from the blog.

This system transforms how readers interact with the reading content of this blog, providing immediate, accurate answers, and enhancing their journey in audio production.

[IMPORTANT NOTE]: This blog is in french! as it was first built to help the french-speaking community.
But for non-french-speaker users, don't worry! All information you need to know to run the application is written in English. Just don't be surprised, the system will return answers in french and I will use test queries in french ;)
For testing the app, you can either run cli.py or test.py that automatically grab a random question (see later). Or if you want to write a question yourself, translate it in french for better experience.

Dataset

The dataset used in this project was generated by scrapping informations about all the articles from the blog Arsonor. All articles can be found and retrieved from the sitemap page.

You can read the code for this "scrapping/getting the data" stage in the following notebook arsonor_parse.ipynb

The resulting dataset was first a JSON file, data/arsonor_data_id.json, that contains 58 records corresponding to the number of articles currently presents on the site. But each article is quite a long text so each record need to pass a chunking step in order to be indexed properly.

The final dataset is this one data/arsonor_chunks_id.json with 589 JSON records in this form:

{
'article_id': "unique article identifier",
'title': "main title of the article",
'category': "the category from where it belongs ('home-studio', 'sound design' or 'post-production')",
'tags': "a list of keywords based on the article content",
'chunk_id': "unique chunk identifier in the form 'article_id'-#",
'chunk_text': "the text content in the chunk"
}

This file serves as the foundation for the knowledge base in the assistant app to answer Audio Production queries.

Technologies used in this project

  • Python 3.12
  • Docker and Docker Compose for containerization
  • Minsearch for full-text search
  • Flask as the API interface (see Background for more information on Flask)
  • Grafana for monitoring and PostgreSQL as the backend for it
  • OpenAI as an LLM (gpt-4o-mini)

Preparation (API Key and installing dependencies)

Since this system use OpenAI, you will need to provide the API key. If you don't have already an account, go to OpenAI platform, create one and put a little money on it. The usage cost for the model used (gpt-4o-mini) is ridiculous.
For OpenAI, it's recommended to create a new project and use a separate key.

Then follow these instructions:

  1. Run a Codespace from this repository (or fork it, or git clone it, whatever you want to do)
  2. Install direnv: for this, run the following lines sudo apt update, sudo apt install direnv and then direnv hook bash >> ~/.bashrc
  3. Insert your API key in the .envrc_template and rename it .envrc
  4. Run direnv allow to load the key into your environment.

For dependency management, I use pipenv, so you need to install it:

pip install pipenv

Once installed, you can install the app dependencies:

pipenv install --dev

Running the application

1- Database configuration

Before the application starts for the first time, the database needs to be initialized.

First, run postgres:

docker-compose up postgres

Then run the db_prep.py script. For this run these lines in order:

pipenv shell cd arsonor_assistant export POSTGRES_HOST=localhost python db_prep.py

2- Running with Docker-Compose

The easiest way to run the application is with docker-compose:

docker-compose up

Optional: Running locally

If you want to run the application locally, start only postres and grafana:

docker-compose up postgres grafana

If you previously started all applications with docker-compose up, you need to stop the app:

docker-compose stop app

Now run the app on your host machine:

pipenv shell

cd arsonor_assistant

export POSTGRES_HOST=localhost
python app.py

Optional: Running with Docker (without compose)

Sometimes you might want to run the application in Docker without Docker Compose, e.g., for debugging purposes.

First, prepare the environment by running Docker Compose as in the previous section.

Next, build the image:

docker build -t arsonor-assistant .

And run it:

docker run -it --rm \
    --network="arsonor-assistant_default" \
    --env-file=".env" \
    -e OPENAI_API_KEY=${OPENAI_API_KEY} \
    -e DATA_PATH="data/arsonor_chunks_id.json" \
    -p 5000:5000 \
    arsonor-assistant

Using the application

When the application is running, we can start using it.

CLI

You can find an interactive CLI application using questionary.

To start it, run:

pipenv run python cli.py

You can also make it randomly select a question from ground truth dataset:

pipenv run python cli.py --random

Using requests

When the application is running, you can use requests to send questions—use test.py for testing it:

pipenv run python test.py

It will pick a random question from the ground truth dataset and send it to the app.

CURL

You can also use curl for interacting with the API:

URL=http://localhost:5000
QUESTION="Comment puis-je augmenter le loudness de ma musique sans saturer le son?"
DATA='{
    "question": "'${QUESTION}'"
}'

curl -X POST \
    -H "Content-Type: application/json" \
    -d "${DATA}" \
    ${URL}/question

You will see something like the following in the response:

{
    "answer": "Pour augmenter le loudness de votre musique sans saturer le son, il est important de comprendre et d'appliquer plusieurs outils et techniques audio : **Compression Dynamique** : L'utilisation d'un compresseur permet de réduire la plage dynamique de votre signal audio. En comprimant les pics de volume, etc, etc....",
    "conversation_id": "b2aef1e8-5140-4185-bc53-7db28ba9815a",
    "question": "Comment puis-je augmenter le loudness de ma musique sans saturer le son?"
}

Sending feedback:

ID="b2aef1e8-5140-4185-bc53-7db28ba9815a"
URL=http://localhost:5000
FEEDBACK_DATA='{
    "conversation_id": "'${ID}'",
    "feedback": 1
}'

curl -X POST \
    -H "Content-Type: application/json" \
    -d "${FEEDBACK_DATA}" \
    ${URL}/feedback

After sending it, you'll receive the acknowledgement:

{
    "message": "Feedback received for conversation b2aef1e8-5140-4185-bc53-7db28ba9815a: 1"
}

Code

The code for the application is in the arsonor_assistant folder:

  • app.py - the Flask API, the main entrypoint to the application
  • rag.py - the main RAG logic for retrieving the data and building the prompt
  • ingest.py - loading the data into the knowledge base
  • minsearch.py - an in-memory search engine
  • db.py - the logic for logging the requests and responses to postgres
  • db_prep.py - the script for initializing the database

There is also some code in the project root directory:

  • test.py - select a random question for testing
  • cli.py - interactive CLI for the App

Interface

I use Flask for serving the application as an API.

Refer to the "Using the Application" section for examples on how to interact with the application.

Ingestion

The ingestion script is in ingest.py.

Since I use an in-memory database, minsearch, as the knowledge base, I run the ingestion script at the startup of the application.

It's executed inside rag.py when we import it.

Notebooks and system evaluations

For experiments, I use Jupyter notebooks. They are in the notebooks folder.

To start Jupyter, run:

cd notebooks
pipenv run jupyter notebook

I have the following notebooks:

Retrieval evaluation

The basic approach - using minsearch without any boosting - and a number of 10 results gave the following metrics:

  • at the chunk-level: Hit rate: 53%, MRR: 31%
  • at the article-level: Hit rate: 59%, MRR: 49%

The search system is more likely to retrieve the correct article than the exact chunk.
Anyway, the system needs improvement.
The lower chunk-based MRR and hit rate imply that additional effort is needed to refine the search results at the chunk level.
Possible ways to improve could include:

  • Improving the text matching at the chunk level by tweaking the search algorithm.
  • Boosting relevance for certain fields (like the chunk_text or specific tags).
  • Investigating semantic search approaches to better understand and retrieve the appropriate chunks within articles.

With tuned boosting, the results at chunk-level improve quite drastically:

  • Hit rate: 70%, MRR: 42%

The best boosting parameters:

boost = {
'title': 1.48,
'tags': 0.31,
'chunk_text': 2.91  
}

Next Steps:

  • Increase chunk size
  • Consider using advanced ranking techniques, such as semantic search models (e.g., BERT-based models), to better capture the meaning of both the query and chunk
  • Analyze Ranking Strategy: retrieve at the article-level first based on 'title' and 'tags' attributes, then at the chunk-level
  • Try other techniques: hybrid search (combining keyword and semantic search), re-ranking, ...

RAG flow evaluation

I used the LLM-as-a-Judge method 2 metric to evaluate the quality of the answers.

For gpt-4o-mini, in a sample with 200 records, I had:

  • RELEVANT 98.5%
  • PARTLY_RELEVANT 1.5%

Monitoring

I use Grafana for monitoring the application.

Setting up Grafana

All Grafana configurations are in the grafana folder:

  • init.py - for initializing the datasource and the dashboard.
  • dashboard.json - the actual dashboard (taken from LLM Zoomcamp without changes).

To initialize the dashboard, first ensure Grafana is running (it starts automatically when you do docker-compose up).

Then run:

pipenv shell --> cd grafana --> env | grep POSTGRES_HOST (make sure the POSTGRES_HOST variable is not overwritten)
And finally run the script: python init.py

Then go to localhost:3000 (add manually the port 3000 if necessary):

  • Login: "admin"
  • Password: "admin"

When prompted, keep "admin" as the new password.

Dashboards

The monitoring dashboard contains several panels:

  1. Last 5 Conversations (Table): Displays a table showing the five most recent conversations, including details such as the question, answer, relevance, and timestamp. This panel helps monitor recent interactions with users.
  2. +1/-1 (Pie Chart): A pie chart that visualizes the feedback from users, showing the count of positive (thumbs up) and negative (thumbs down) feedback received. This panel helps track user satisfaction.
  3. Relevancy (Gauge): A gauge chart representing the relevance of the responses provided during conversations. The chart categorizes relevance and indicates thresholds using different colors to highlight varying levels of response quality.
  4. OpenAI Cost (Time Series): A time series line chart depicting the cost associated with OpenAI usage over time. This panel helps monitor and analyze the expenditure linked to the AI model's usage.
  5. Tokens (Time Series): Another time series chart that tracks the number of tokens used in conversations over time. This helps to understand the usage patterns and the volume of data processed.
  6. Model Used (Bar Chart): A bar chart displaying the count of conversations based on the different models used. This panel provides insights into which AI models are most frequently used.
  7. Response Time (Time Series): A time series chart showing the response time of conversations over time. This panel is useful for identifying performance issues and ensuring the system's responsiveness.

Background

Here I provide background on some tech not used in the course and links for further reading.

Flask

I use Flask for creating the API interface for the application. It's a web application framework for Python: we can easily create an endpoint for asking questions and use web clients (like curl or requests) for communicating with it.

In our case, we can send questions to http://localhost:5000/question.

For more information, visit the official Flask documentation.

Acknowledgements

A grateful thanks to Alexey Grigorev for the creation and supervision of this LLM Zoomcamp without which this project would not have been possible. I would like to thank him as well for all his valuable teaching and support.

And I don't forget the help of the entire Slack community to answer all our questions. Last but not least, thanks to my peers for reviewing this project and helping me to improve it.

About

This is a LLM RAG system (built as a part of LLM Zoomcamp) designed to answer readers' questions on music production, sound design, post-production, and more.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.1%
  • Python 3.9%