AI-Powered Knowledge Assistant for Audio Production

Arsonor is a blog with french content dedicated to home-studio audio production, aimed at creators of music, podcasts, videos, films and video games.

It covers essential topics such as sound engineering, sound design, post-production and home-studio optimization. You will find practical advice on improving sound quality, managing audio levels and using tools such as VST synths and mixing software. The goal is to make home sound creation accessible, by providing techniques and tips for a professional result.

This project introduces a LLM RAG (Retrieval-Augmented Generation) system designed to answer readers' questions on music production, sound design, post-production, and more.

It was implemented for LLM Zoomcamp - a free course about LLMs and RAG.

Project overview

The Arsonor-LLM-RAG system integrates an AI-powered model capable of understanding user queries and retrieving relevant content from Arsonor’s knowledge base. This approach combines the flexibility of language models with the accuracy of focused content retrieval, ensuring that users get precise, helpful responses.

Examples of Use:

User Question: "What is a VCO and how does it work in a synthesizer?"

LLM RAG Response: Detailed explanation of VCOs with relevant information on their modulation and sound generation.

User Question: "How do I reduce background noise in my podcast recordings?"

LLM RAG Response: Techniques for noise reduction using EQ, gating, and software solutions, linking directly to Arsonor tutorials.

User Question: "What equipment do I need for a home studio?"

LLM RAG Response: Tailored suggestions based on the user’s goals, from budget setups to professional configurations, with in-depth guides from the blog.

This system transforms how readers interact with the reading content of this blog. It enhances the user experience by delivering immediate, accurate and context-aware responses to queries, drawing from the blog's extensive content library, making it easier for creators to find the information they need.

[IMPORTANT NOTE]: This blog is in french! as it was first built to help the french-speaking community.
But for non-french-speaker users, don't worry! All information you need to know to run the application is written in English. Just don't be surprised when I use test queries in french ;)
For testing the app, you can write in english and the system will answer in english. But it's preferable to translate it in french for better experience.

Dataset

The dataset used in this project was generated by scrapping informations about all the articles from the blog Arsonor. All articles can be found and retrieved from the sitemap page.

You can read the code for this "scrapping/getting the data" stage in the following notebook 1-arsonor_parse.ipynb

The resulting dataset was first a JSON file, data/arsonor_data_id.json, that contains 58 records corresponding to the number of articles currently presents on the site. But each article is quite a long text so each record need to pass a chunking step in order to be indexed properly.

The final dataset is this one data/arsonor_chunks_300_50.json, meaning each record chunk contains 300 words and 50 words overlap between chunks. Refer to the "Notebooks and system evaluations" section to know more about the process.

This file contains 572 JSON records in this form:

{
'article_id': "unique article identifier",
'title': "main title of the article",
'url': "url of the article",
'category': "the category from where it belongs ('home-studio', 'sound design' or 'post-production')",
'tags': "a list of keywords based on the article content",
'chunk_id': "unique chunk identifier in the form 'article_id'-#",
'chunk_text': "the text content in the chunk"
}

This file serves as the foundation for the knowledge base in the assistant app to answer Audio Production queries.

Technologies used in this project

Python 3.12
Docker and Docker Compose for containerization
Elasticsearch for full-text search
Streamlit as the interface
Grafana for monitoring and PostgreSQL as the backend for it
OpenAI as an LLM (gpt-4o-mini)

Preparation (API Key and installing dependencies)

Since this system use OpenAI, you will need to provide the API key. If you don't have already an account, go to OpenAI platform, create one and put a little money on it. The usage cost for the model used (gpt-4o-mini) is ridiculous.
For OpenAI, it's recommended to create a new project and use a separate key.

Then follow these instructions:

Run a Codespace from this repository (or fork it, or git clone it, whatever you want to do)
Install direnv: for this, run the following lines

sudo apt update
sudo apt install direnv
direnv hook bash >> ~/.bashrc

Insert your API key in the .envrc_template and rename it .envrc
Run

direnv allow

to load the key into your environment.

For dependency management, I use pipenv, so you need to install it:

pip install pipenv

Once installed, you can install the app dependencies:

pipenv install --dev

Running and using the application

The easiest way to run the application is with docker-compose:

docker-compose up

Then, before entering Streamlit, it's important to initialize the database and ingest data. For this, run these lines in order:

pipenv shell  

cd arsonor_assistant  

export POSTGRES_HOST=localhost  

python ingest.py

Finally, you can start using the application with streamlit running on port 8501: localhost:8501

Monitoring

I use Grafana for monitoring the application.
Go to localhost:3000 (add manually the port 3000 if necessary):

Login: "admin"
Password: "admin"

When prompted, keep "admin" as the new password (or just 'skip' this step).

Dashboard

Click on Dashboards in the sidebar and Arsonor assistant.
If necessary, initialize the dashboard:

pipenv shell  

cd grafana  

python init.py

Once in Grafana, make sure to have postgres as "Host URL" (Sidebar: Connections --> Data Sources --> PostgreSQL)

The monitoring dashboard contains several panels:

Last 5 Conversations (Table): Displays a table showing the five most recent conversations, including details such as the question, answer, relevance, and timestamp. This panel helps monitor recent interactions with users.
+1/-1 (Pie Chart): A pie chart that visualizes the feedback from users, showing the count of positive (thumbs up) and negative (thumbs down) feedback received. This panel helps track user satisfaction.
Relevancy (Gauge): A gauge chart representing the relevance of the responses provided during conversations. The chart categorizes relevance and indicates thresholds using different colors to highlight varying levels of response quality.
OpenAI Cost (Time Series): A time series line chart depicting the cost associated with OpenAI usage over time. This panel helps monitor and analyze the expenditure linked to the AI model's usage.
Tokens (Time Series): Another time series chart that tracks the number of tokens used in conversations over time. This helps to understand the usage patterns and the volume of data processed.
Model Used (Bar Chart): A bar chart displaying the count of conversations based on the different models used. This panel provides insights into which AI models are most frequently used.
Response Time (Time Series): A time series chart showing the response time of conversations over time. This panel is useful for identifying performance issues and ensuring the system's responsiveness.

Code

The code for the application is in the arsonor_assistant folder:

app.py - the Streamlit code, the main entrypoint to the application
rag.py - the main RAG logic for retrieving the data and building the prompt
ingest.py - the ingestion script for loading the data into the knowledge base and initializing database
db.py - the logic for logging the requests and responses to postgres

All Grafana configurations are in the grafana folder:

init.py - for initializing the datasource and the dashboard.
dashboard.json - the actual dashboard (taken from LLM Zoomcamp without changes).

Notebooks and system evaluations

For experiments, I use Jupyter notebooks. They are in the notebooks folder.

To start Jupyter, run:

cd notebooks
pipenv run jupyter notebook

I have the following notebooks:

1-arsonor_parse.ipynb: The code for retrieving, chunking and cleaning articles text from the website and export a json file
2-text-search_rag_evaluation.ipynb: The RAG flow and evaluation of the text-search system
3-embeddings_rag_evaluation.ipynb: The RAG flow and retrieval evaluation for vector and hybrid-search system with different embedding models tested
ground_truth_data_generation.ipynb: Generating the ground-truth dataset for evaluations
minsearch.py: an in-memory search engine (conceived during LLM Zoomcamp) which was first used before using elasticsearch
rag_experiments.ipynb: I experimented a new search process with a Two-level retrieval mechanism (article-level followed by chunk-level). The "build prompt" function is also boosted to retrieve more diverse text chunks but still relevant to the request. I also tested a new retrieval evaluation code (thanks Claude!) to be able to evaluate this new system. I have not used it for the final code in the pipeline but it's quite promising to further application in this way.

Retrieval evaluation

The basic approach - using minsearch without any boosting - and a number of 10 results gave the following metrics:

at the chunk-level: Hit rate: 53%, MRR: 31%
at the article-level: Hit rate: 59%, MRR: 49%

The search system is more likely to retrieve the correct article than the exact chunk.
Anyway, the system needs improvement.
The lower chunk-based MRR and hit rate imply that additional effort is needed to refine the search results at the chunk level.
Possible ways to improve could include:

Improving the text matching at the chunk level by tweaking the search algorithm.
Boosting relevance for certain fields (like the chunk_text or specific tags).
Investigating semantic search approaches to better understand and retrieve the appropriate chunks within articles.

With tuned boosting, the results at chunk-level improve quite drastically:

Hit rate: 66%, MRR: 40%

Next Steps:

Increase chunk size
Consider using advanced ranking techniques, such as semantic search models (e.g., BERT-based models), to better capture the meaning of both the query and chunk
Analyze Ranking Strategy: retrieve at the article-level first based on 'title' and 'tags' attributes, then at the chunk-level
Try other techniques: hybrid search (combining keyword and semantic search), re-ranking, ...

I first tested to increase chunk size with a new dynamic method that don't cut sentences in the middle (see in1-arsonor_parse.ipynb).
This resulted in two new data json files:

"arsonor_chunks_300_50" (300 words chunks with 50 words overlap)
"arsonor_chunks_350_30" (350 words chunks with 30 words overlap)

The Hit Rate and MRR results improved quite a lot with Minsearch, Text-search boosted. But the highest results were obtained when I replaced Minsearch by Elasticsearch:

for 300_50 chunks: Hit rate 87%, MRR 61%
for 350_30 chunks: Hit rate 86%, MRR 59%

I then tested vector-search with different models of Sentence Transformers (multi-lingual paraphrase) and two Huggingface "specialized in french language" (camemBert and mBert). But to my great disappointment, the results were not up to par:

I have no explanation why the metrics don't improve at all with vector-search. You can watch yourself in 3-embeddings_rag_evaluation.ipynb

I don't know either why they improved between Minsearch boosted and a basic elasticsearch text-search without boost.

So, the actual system in the pipeline is finally this one:

dataset arsonor_chunks_300_50.json
text-search elasticsearch

RAG flow evaluation

I used the LLM-as-a-Judge method 2 metric to evaluate the quality of the answers.

For gpt-4o-mini, in a sample with 200 records, I had the following results:

With arsonor_chunks_240_20:

RELEVANT 98%
PARTLY RELEVANT 1.5%
NON RELEVANT 0.5% (it's just one actually totally relevant)

With arsonor_chunks_300_50:

RELEVANT 98.5%
PARTLY RELEVANT 1.5%

Acknowledgements

A grateful thanks to Alexey Grigorev for the creation and supervision of this LLM Zoomcamp without which this project would not have been possible. I would like to thank him as well for all his valuable teaching and support.

And I don't forget the help of the entire Slack community to answer all our questions. Last but not least, thanks to my peers for reviewing this project and helping me to improve it.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
arsonor_assistant		arsonor_assistant
data		data
images		images
notebooks		notebooks
.env		.env
.envrc_template		.envrc_template
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Knowledge Assistant for Audio Production

Table of contents

Project overview

Dataset

Technologies used in this project

Preparation (API Key and installing dependencies)

Running and using the application

Monitoring

Dashboard

Code

Notebooks and system evaluations

Retrieval evaluation

RAG flow evaluation

Acknowledgements

About

Releases

Packages

Languages

arsonor/AI-Powered-Knowledge-Assistant-for-Audio-Prod

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Knowledge Assistant for Audio Production

Table of contents

Project overview

Dataset

Technologies used in this project

Preparation (API Key and installing dependencies)

Running and using the application

Monitoring

Dashboard

Code

Notebooks and system evaluations

Retrieval evaluation

RAG flow evaluation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages