Skip to content

Implemented a scalable pipeline for deploying an AI model for fruit classification in the cloud. Utilized AWS services (EMR, S3) for Big Data processing, PySpark for distributed computations, and TensorFlow for image classification, ensuring efficient infrastructure and robust security.

Notifications You must be signed in to change notification settings

isocan/fruit-classification-cloud-deployment

Repository files navigation

Fruit Classification AI Model in the Cloud

Project Overview

This project develops and deploys an AI model for automated fruit and vegetable classification. Leveraging AWS cloud infrastructure and distributed computing with PySpark, the system processes large-scale image datasets efficiently. The deployment pipeline focuses on scalability, security, and compliance, ensuring readiness for real-world applications.

Objectives

  • Model Development:
    • Train a convolutional neural network (CNN) using TensorFlow for fruit classification.
    • Process a dataset with over 22,000 images, representing 130+ fruit and vegetable types.
  • Cloud Deployment:
    • Use AWS EMR (Elastic MapReduce) and S3 for distributed computation and data storage.
    • Implement security measures in compliance with GDPR standards.
  • Scalability:
    • Address computational resource limitations with a cloud-based scalable architecture.

Tools & Techniques

  • Data Processing:
    • PySpark for distributed data processing.
    • Amazon S3 for secure and scalable storage.
  • Machine Learning:
    • TensorFlow for CNN training.
    • t-SNE for advanced visualization of image features.
  • Cloud Infrastructure:
    • AWS EMR for parallel computation.
    • SSH tunneling and IAM for secure data access.
  • Data Handling:
    • Preprocessing with Python libraries: Pillow, PyArrow, and Matplotlib.

Key Results

  • Efficient Processing:
    • Scaled image preprocessing and feature extraction using PySpark.
    • Deployed CNN model achieving high accuracy on fruit classification tasks.
  • Cloud Integration:
    • Seamless access and storage of data using AWS SDK (Boto3).
    • Enhanced infrastructure security with IAM and GDPR compliance.
  • Advanced Insights:
    • Visualized complex image datasets using t-SNE for pattern analysis.

Deliverables

  • Jupyter Notebooks:
    • Local preprocessing and initial model training.
    • AWS-integrated pipeline for scalable image classification.
  • Presentation:
    • Outlines methodology, results, and future recommendations.
  • Scripts:
    • Python scripts for cloud interaction and data handling.

Future Work

  • Enhance model generalization with advanced data augmentation techniques.
  • Extend the pipeline to include multi-cloud support (e.g., Azure, GCP).
  • Integrate real-time inference capabilities for robotic applications.

About

Implemented a scalable pipeline for deploying an AI model for fruit classification in the cloud. Utilized AWS services (EMR, S3) for Big Data processing, PySpark for distributed computations, and TensorFlow for image classification, ensuring efficient infrastructure and robust security.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published