This project is a proof-of-concept demonstraing how you can use LLMs to perform competitive intelligence on customer reviews and feedback.
In this scenario, we're taking G2 reviews and performing topic modelling in a simple streamlit app.
The overall (processing) pipeline is as follows:
- Get the G2 company reviews for your target companies (manual step, instructions below)
- Basic data reshaping from resulting json (
preprocess.py
) - Split reviews into sentences
- Embed sentences
- Reduce dimensionality (slightly) and cluster sentences
- Find N points close to the center of each cluster and stuff them in the LLM to extract the topics
- Reduce dimensions to 2D in order to visualize
- Setting the
OPENAI_API_KEY
is mandatory - If you want to fetch a new set of companies you need to set
APIFY_API_TOKEN
, otherwise, it will use the sample G2 reviews in the repo.
git clone https://github.com/balmasi/g2_reviews_llm_topic_modeling
The easiest way to do this is to use Conda.
# Create the g2_reviews_topic_modeling_llm virtual environment
conda create -n g2_reviews_topic_modeling_llm python=3.10
# Activate the virtual environment
conda activate g2_reviews_topic_modelling_llm
pip install -r requirements.txt
- Browse to your target G2 profiles to grab the slug from the url. For example
https://www.g2.com/products/vena/reviews
would bevena
- Place each target company on a line in the
data/slugs-to-fetch.txt
file. - Set the
APIFY_API_TOKEN
in the .env file to your Apify API token - run the create_dataset.py command using
python data/create_dataset.py
To run the app, execute the following command:
streamlit run streamlit_app.py
This will start the Streamlit server and launch the app in your default web browser.