This project develops and deploys an AI model for automated fruit and vegetable classification. Leveraging AWS cloud infrastructure and distributed computing with PySpark, the system processes large-scale image datasets efficiently. The deployment pipeline focuses on scalability, security, and compliance, ensuring readiness for real-world applications.
- Model Development:
- Train a convolutional neural network (CNN) using TensorFlow for fruit classification.
- Process a dataset with over 22,000 images, representing 130+ fruit and vegetable types.
- Cloud Deployment:
- Use AWS EMR (Elastic MapReduce) and S3 for distributed computation and data storage.
- Implement security measures in compliance with GDPR standards.
- Scalability:
- Address computational resource limitations with a cloud-based scalable architecture.
- Data Processing:
- PySpark for distributed data processing.
- Amazon S3 for secure and scalable storage.
- Machine Learning:
- TensorFlow for CNN training.
- t-SNE for advanced visualization of image features.
- Cloud Infrastructure:
- AWS EMR for parallel computation.
- SSH tunneling and IAM for secure data access.
- Data Handling:
- Preprocessing with Python libraries: Pillow, PyArrow, and Matplotlib.
- Efficient Processing:
- Scaled image preprocessing and feature extraction using PySpark.
- Deployed CNN model achieving high accuracy on fruit classification tasks.
- Cloud Integration:
- Seamless access and storage of data using AWS SDK (Boto3).
- Enhanced infrastructure security with IAM and GDPR compliance.
- Advanced Insights:
- Visualized complex image datasets using t-SNE for pattern analysis.
- Jupyter Notebooks:
- Local preprocessing and initial model training.
- AWS-integrated pipeline for scalable image classification.
- Presentation:
- Outlines methodology, results, and future recommendations.
- Scripts:
- Python scripts for cloud interaction and data handling.
- Enhance model generalization with advanced data augmentation techniques.
- Extend the pipeline to include multi-cloud support (e.g., Azure, GCP).
- Integrate real-time inference capabilities for robotic applications.