This project is part of the Kleos 2.0 hackathon and aims to develop an AI-driven system for real-time log monitoring and anomaly detection. With the exponential growth of digital assets and interconnected systems, the volume and complexity of log data have increased significantly. Traditional methods of manual log analysis and rule-based anomaly detection are no longer sufficient to identify sophisticated threats and abnormal behaviors effectively. This project leverages advanced artificial intelligence (AI) techniques to address these challenges, enhancing cybersecurity measures and ensuring robust data protection.
- Preprocessing: Cleaning and normalizing log data, extracting relevant features such as timestamps, source IP addresses, event types, and log messages.
- Feature Engineering: Transforming raw log data into structured features using techniques like tokenization, one-hot encoding, and dimensionality reduction.
- Isolation Forest: Detecting anomalies in high-dimensional data.
- Random Forest: Detecting anomalies also to show difference.
-
BERT and Vader : Understanding and analyzing log messages using natural language processing.
-
Key Files:
-
Data Generation: Generates synthetic log data.
-
Trend Analysis: Analyzes and visualizes trends in the log data.
-
Clustering: Groups similar log entries together.
-
Correlation Analysis: Identifies relationships between different event types.
-
Sentiment Analysis: Classifies log messages into positive, negative, or neutral sentiments and visualizes the results.
-
- Streaming Analytics: Using Apache Spark Streaming
- Alert Generation: Triggering alerts based on the output of anomaly detection models.
- Dashboard Overview: Displaying key metrics such as total logs processed, number of anomalies detected, and system status.
- Log Data Visualization: Interactive charts and graphs to explore trends, patterns, and anomalies.
- Alerts Panel: Highlighting real-time alerts with details like anomaly type, severity, timestamp, and affected resources.
- Python 3.x
- Jupyter Notebook
- ReactJS
- Apache Spark, Kafka
- Libraries: pandas, numpy, sklearn, matplotlib, gzip, os, datetime, nltk etc