Skip to content

Developed an automated product classification system using NLP and computer vision techniques. The project utilizes advanced models like SIFT, VGG16, and BERT to classify e-commerce products based on text descriptions and images. Includes preprocessing, feature extraction, and supervised classification with actionable insights for marketplace.

Notifications You must be signed in to change notification settings

isocan/automated-product-classification

Repository files navigation

Automated Product Classification for E-Commerce

Project Overview

This project aims to automate product classification for an e-commerce marketplace. It utilizes natural language processing (NLP) and computer vision techniques to classify products based on text descriptions and images. By automating the categorization process, the project improves user experience and facilitates efficient product searches.

Objectives

  • Text Classification:
    • Extract features from product descriptions using TF-IDF, Word2Vec, and BERT.
    • Group similar products and generate category labels.
  • Image Classification:
    • Utilize SIFT and VGG16 for feature extraction.
    • Implement supervised classification to categorize products based on images.
  • Integration:
    • Combine textual and visual features for robust classification.
    • Evaluate feasibility and performance using machine learning models.

Tools & Techniques

  • NLP Methods:
    • Bag of Words, TF-IDF
    • Pretrained models: Word2Vec, BERT, Universal Sentence Encoder
  • Computer Vision Methods:
    • SIFT: Scale-Invariant Feature Transform for manual feature extraction.
    • VGG16: Deep learning-based CNN for image classification.
  • Clustering & Classification:
    • K-Means for unsupervised clustering.
    • Supervised classification with CNNs.
  • Evaluation Metrics:
    • Adjusted Rand Index (ARI), Silhouette Score, F1 Accuracy.

Key Insights

  • Text-Based Models:
    • TF-IDF performs well with balanced speed and accuracy.
    • BERT and USE provide high-quality embeddings but require more computational resources.
  • Image-Based Models:
    • VGG16 outperforms SIFT due to its ability to capture deep, complex features.
    • Data augmentation enhances model generalization.
  • Combined Approach:
    • Future integration of text and image features is promising for improving accuracy.

Deliverables

  • Jupyter Notebooks:
    • Preprocessing and feature extraction for text and images.
    • Supervised and unsupervised classification experiments.
  • Presentation:
    • Summarizes methods, results, and future perspectives.
  • Code:
    • Includes implementation of NLP pipelines, CNN architectures, and data augmentation strategies.

Why This Project?

This project demonstrates:

  • Expertise in NLP and computer vision integration.
  • Proficiency with advanced machine learning frameworks for classification.
  • Practical application of data science to solve real-world e-commerce challenges.

About

Developed an automated product classification system using NLP and computer vision techniques. The project utilizes advanced models like SIFT, VGG16, and BERT to classify e-commerce products based on text descriptions and images. Includes preprocessing, feature extraction, and supervised classification with actionable insights for marketplace.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published