The data comprises of several entries on several days on NASA JPL, USA. All the data apis are fully functional and can be used for data gathering straight from them. Personally the data is taken from a kaggle dataset , Link : kaggle/nasa-asteroid-classification . The data is actually stored in json format.
Data has been stored using DVC(Data version Control), so the repository package can be used flexibly without adding the data straight in the repo but fetch from any remote source e.g. AWS S3, GDRIVE, etc. For this case, the data has been stored in GDRIVE.
The data follows a strict data science project structure.
.
└── root/
├── config/
│ ├── data
│ └── models
├── data/
│ ├── external
│ ├── interim
│ ├── primary
│ ├── processed
│ └── raw
├── docs
├── models
├── notebooks
├── references
├── report/
│ └── figures
└── src/
├── data
├── features
├── models
└── visualization
All the installation and usage techniques are shared in getting_started.md and in commands.md
Will update soon :)
You can visit reports directory where all the runs are stored. Currently, for some privacy issues, the mlflow runs are not shared in here.