This project aims to automate product classification for an e-commerce marketplace. It utilizes natural language processing (NLP) and computer vision techniques to classify products based on text descriptions and images. By automating the categorization process, the project improves user experience and facilitates efficient product searches.
- Text Classification:
- Extract features from product descriptions using TF-IDF, Word2Vec, and BERT.
- Group similar products and generate category labels.
- Image Classification:
- Utilize SIFT and VGG16 for feature extraction.
- Implement supervised classification to categorize products based on images.
- Integration:
- Combine textual and visual features for robust classification.
- Evaluate feasibility and performance using machine learning models.
- NLP Methods:
- Bag of Words, TF-IDF
- Pretrained models: Word2Vec, BERT, Universal Sentence Encoder
- Computer Vision Methods:
- SIFT: Scale-Invariant Feature Transform for manual feature extraction.
- VGG16: Deep learning-based CNN for image classification.
- Clustering & Classification:
- K-Means for unsupervised clustering.
- Supervised classification with CNNs.
- Evaluation Metrics:
- Adjusted Rand Index (ARI), Silhouette Score, F1 Accuracy.
- Text-Based Models:
- TF-IDF performs well with balanced speed and accuracy.
- BERT and USE provide high-quality embeddings but require more computational resources.
- Image-Based Models:
- VGG16 outperforms SIFT due to its ability to capture deep, complex features.
- Data augmentation enhances model generalization.
- Combined Approach:
- Future integration of text and image features is promising for improving accuracy.
- Jupyter Notebooks:
- Preprocessing and feature extraction for text and images.
- Supervised and unsupervised classification experiments.
- Presentation:
- Summarizes methods, results, and future perspectives.
- Code:
- Includes implementation of NLP pipelines, CNN architectures, and data augmentation strategies.
This project demonstrates:
- Expertise in NLP and computer vision integration.
- Proficiency with advanced machine learning frameworks for classification.
- Practical application of data science to solve real-world e-commerce challenges.