Software Engineering Trends on Docker Hub

An end-to-end framework which help the company to predict software engineering trends and the developers to know more about a docker image.

Our goal is to provide different companies with a dynamic dataset through which meaningful inferences can be made.

Our aim is to gather data from Docker Hub and analyse the trends. Docker Hub is a cloud-based repository in which Docker users and partners create, test, store and distribute container images.

This project was developed as part of coursework for Data-X at Berkeley.

Link to supporting presentation

Requirements

We use Conda to manage the environment and packages.

We use the following packages (among many others):

Python 3.6 or above
Pandas
Matplotlib
Plotly
Seaborn
boto3

To fetch new .json files from the AWS S3 bucket

cd data/
aws s3 sync s3://docker-recent recent-data

Installation

1. Downloading this Respository

Start by downloading or cloning this repository.

git clone https://github.com/cshubhamrao/docker-hub-data.git
cd docker-hub-data-x

2. Create and Activate Environment

Create the conda environment from the environment.yml file:

conda env create -f environment.yml

Now activate the environment by:

conda activate docker-hub

3. Run Jupyter Lab

jupyter lab

Data - This folder contains all the data related files and folders that are generated or are stored for later use. This is also the folder where all the 'plots' generated by analytics.ipynb and another scripts.
Misc - Contains all the miscellaneous scripts that are required for this project.
Scripts - This folder is the main folder. This contains all the scripts that we used to scrape the data, clean that data, select required data to do analysis, and finally do analysis on the data and derive inference from the data.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.idea		.idea
data		data
misc		misc
models		models
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
Datax_architecture.png		Datax_architecture.png
Lambda Architecture.png		Lambda Architecture.png
README.md		README.md
S3 storage.png		S3 storage.png
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Software Engineering Trends on Docker Hub

Requirements

Installation

1. Downloading this Respository

2. Create and Activate Environment

3. Run Jupyter Lab

Contents

Team Members

System Architecture:

About

Releases

Packages

Contributors 4

Languages

cshubhamrao/docker-hub-data

Folders and files

Latest commit

History

Repository files navigation

Software Engineering Trends on Docker Hub

Requirements

Installation

1. Downloading this Respository

2. Create and Activate Environment

3. Run Jupyter Lab

Contents

Team Members

System Architecture:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages