This project is a Python application that uses a Cassandra schema to generate recommended use cases for Generative AI.
The output is a Markdown formatted report you can find in report_output
It will describe the use case(s) for your schema and suggest any GenAI use cases you could add. It will then give you the exact table changes needed to implement the idea you want!
Warning: The best results are with GPT4. I've tested with other models, such as Claude 2 and Llama2. None produce results as good as GPT4. I'll continue to test models for accuracy.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Python 3.6 or higher
- pip
Depending on your use case, one of the following is needed to extract the use cases.
- A DataStax Astra instance
- DataStax Enterprise instance
- Apache Cassandra® instance
- Exported schema file
A variety of LLMs APIs are supported.
- OpenAI
- GCP Vertex AI
- AWS Bedrock
Pending
- Azure GPT
- Ollama(local)
- Clone the repository:
git clone https://github.com/pmcfadin/cassandra_ai_accelerator.git
- Navigate to the project directory:
cd cassandra_ai_accelerator
- Install the required Python packages:
pip install -r requirements.txt
Before running the application, you need to set up your configuration in the settings.toml
file. There are defaults but you can make any needed changes.
For sensative data, rename the example.settings.toml
file into .settings.toml
and change setting for your use case.
To run the application, execute the app.py
script:
python app.py
- Allow for Local LLM usage. Privacy concerns
- Fine-tune an LLM for this specific task (Better answers)
- Make the use case exploration interactive for more specific use cases
- Add more types of reports. SAI conversion. Schema optimization
- Analyze data and suggest how data could be vectorized
- (Stretch) Create sample code for LangChain or LlamaIndex or ...
The provided schema supports a video sharing and social interaction platform named "KillrVideo." This platform allows users to upload videos, comment on videos, rate videos, and receive video recommendations. The schema accommodates user management, video metadata storage, tagging, playback statistics, and user interaction (comments, ratings) with videos.
- Video Content Management: Users can upload videos with descriptions, tags, and preview images.
- Social Engagement: Users can comment on videos, rate them, and receive recommendations.
- Analytics: The platform tracks video views and ratings for analytical purposes.
- Personalization: Users receive recommendations based on their interactions.
- Community Features: Videos and comments can be tagged for easier discovery.
The application facilitated by this schema seems to be a comprehensive video-sharing platform with a strong emphasis on community engagement and content discoverability.
Generative AI can automatically generate summaries for videos based on the video description, comments, and tags.
Data Model Changes:
- Alter the
videos
table to include asummary
text column.ALTER TABLE killrvideo.videos ADD summary text;
- For storing and searching through summaries by similarity, create a new table with vector support for the summary text.
CREATE TABLE killrvideo.video_summaries_vs ( videoid uuid, summary text, summary_vector VECTOR<FLOAT, 128>, -- Assuming a 128-dimensional embedding PRIMARY KEY (videoid) ) WITH CLUSTERING ORDER BY (videoid ASC);
Improve recommendation systems by incorporating vector-based similarity search on user preferences, video descriptions, and user interactions.
Data Model Changes:
- Incorporate vector columns in relevant tables such as
video_recommendations
or create a new dedicated table for vector-based recommendations.or, for a dedicated approach,ALTER TABLE killrvideo.video_recommendations ADD recommendation_vector VECTOR<FLOAT, 128>;
CREATE TABLE killrvideo.video_recommendations_vs ( userid uuid, videoid uuid, recommendation_vector VECTOR<FLOAT, 128>, PRIMARY KEY (userid, videoid) ) WITH CLUSTERING ORDER BY (userid ASC);
By analyzing the sentiment of the comments, the platform can better understand user engagement and filter or highlight comments based on positivity.
Data Model Changes:
- Add a
sentiment_score
column to thecomments_by_video
andcomments_by_user
tables.ALTER TABLE killrvideo.comments_by_video ADD sentiment_score float; ALTER TABLE killrvideo.comments_by_user ADD sentiment_score float;
- Create a vector table for sentiment analysis on comments.
CREATE TABLE killrvideo.comments_sentiment_vs ( commentid timeuuid, comment text, sentiment_vector VECTOR<FLOAT, 5>, -- Example dimension for sentiment PRIMARY KEY (commentid) );
Automatically generate and suggest tags for new videos based on video descriptions and names using Natural Language Processing (NLP).
Data Model Changes:
- Implementing AI to suggest tags doesn't require changes to the existing schema but integrating an AI model to process video uploads and updating the
tags
set in thevideos
table accordingly.
To support AI-driven features like summarization, recommendation, sentiment analysis, and tag generation, vector search capabilities have been introduced into the data model. These capabilities enable similarity-based operations, leveraging the semantic understanding of content.
Example Usage with Generative AI:
-
Recommendation Enhancements:
- After generating embeddings for video content and user preferences, use vector search to find the closest matches for personalized recommendations.
-
Sentiment Analysis on Comments:
- Use vector search to find comments with similar sentiments, enabling features like filtering comments by positivity or negativity.
Enhancing the KillrVideo schema with Generative AI and vector search capabilities can significantly improve user experience through personalized content, better engagement through sentiment analysis, and efficient management of content through summarization and tagging. The vector search functionality in Cassandra adds a powerful tool for leveraging semantic similarities within the data, paving the way for advanced AI-driven features in applications.