🚂 ModelTrainSet: All Aboard the ML Express! 🚂

Welcome to ModelTrainSet, your one-stop-shop for creating custom datasets and training machine learning models! Whether you're a data scientist, a machine learning engineer, or just someone who likes to play with big data and bigger models, ModelTrainSet has got your back!

🎭 What's This All About?

ModelTrainSet is like a Swiss Army knife for your data needs. It can:

📥 Load data from various sources (JSON, CSV, Excel, XML, SQL, Git/Jira, Twitter)
🧹 Clean and process your data
🎨 Format your data for different ML tasks
🚀 Train models using the latest techniques

It's perfect for when you need to wrangle your data into shape and then teach a model to do tricks with it!

🎟️ Getting Your Ticket to Ride

Before you hop on the ModelTrainSet express, make sure you have:

Python 3.7+ installed (we're not cavemen, after all)
Git (for version control and looking cool)
Access to a Jira instance (if you're into that sort of thing)
Linux for training. (Blame triton)

🧳 Packing Your Bags (Installation)

We've upgraded our luggage handling system! Now you can choose between the classic pip setup or our new first-class Conda/Mamba experience.

🌟 First Class: Conda/Mamba Setup (Recommended)

If you haven't already, install Miniconda or Anaconda. For an even faster setup, install Mamba.

Clone our luxury liner:

git clone https://github.com/muddylemon/ModelTrainSet.git
cd ModelTrainSet

Create and activate your environment:

Using Conda:

conda env create -f environment.yml
conda activate modeltrainset

Or, for a faster setup with Mamba:

mamba env create -f environment.yml
mamba activate modeltrainset

You're all set! Enjoy your first-class ML journey!

🛠️ Manual Setup (if you encounter issues)

If you experience any problems with the automatic setup, you can try the following manual steps:

conda create --name modeltrainset python=3.10
conda activate modeltrainset

conda install pytorch cudatoolkit torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

conda install xformers -c xformers

pip install bitsandbytes

pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

pip install transformers datasets accelerate tqdm pyyaml nltk pandas openpyxl sqlalchemy gitpython jira python-dotenv peft trl

Replace conda with mamba in the above commands if you're using Mamba for faster installation.

🚶‍♂️ Economy Class: Pip Setup

If you prefer the classic experience, follow these steps:

Clone this bad boy:

git clone https://github.com/muddylemon/ModelTrainSet.git
cd ModelTrainSet

Set up your virtual environment (because we're responsible adults):

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the necessities:
```
pip install -r requirements.txt
```

🚂 All Aboard! (Usage)

For detailed instructions on how to use ModelTrainSet, check out our comprehensive tutorial. It covers everything from creating datasets to training your own models!

🛤️ Extending Your Journey

Want to add a new stop on the ModelTrainSet line? Here's how:

Create new loader, processor, or formatter classes in dataset_creator/.
Add a new creator class in dataset_creator/creators/.
Update get_creator() in main.py to recognize your new creation.

For more details on contributing to ModelTrainSet, please read our contribution guide.

🆘 Help! I'm Lost

If you find yourself in a dark tunnel:

Check your Python version (python --version).
Make sure you've installed all the requirements (pip install -r requirements.txt).
Double-check your config file. Typos are the bane of every data scientist's existence!

🤝 Join the Crew

Contributions are welcome! Whether you're fixing bugs, adding features, or just making our jokes funnier, we'd love to have you on board! Check out our contribution guide to get started.

📜 The Fine Print on Your Ticket Stub

This project is licensed under the MIT License - see the LICENSE file for details. (It's basically "use it however you want, just don't blame us if something goes wrong".)

Remember, in the world of ModelTrainSet, every day is training day! Now go forth and model responsibly! 🚂💨

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
config/training		config/training
dataset_creator		dataset_creator
model_trainer		model_trainer
tools		tools
tutorials		tutorials
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
dev-requirements.txt		dev-requirements.txt
environment.yml		environment.yml
main.py		main.py
pre-commit-config.txt		pre-commit-config.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚂 ModelTrainSet: All Aboard the ML Express! 🚂

🎭 What's This All About?

🎟️ Getting Your Ticket to Ride

🧳 Packing Your Bags (Installation)

🌟 First Class: Conda/Mamba Setup (Recommended)

🛠️ Manual Setup (if you encounter issues)

🚶‍♂️ Economy Class: Pip Setup

🚂 All Aboard! (Usage)

Other Tutorials

🛤️ Extending Your Journey

🆘 Help! I'm Lost

🤝 Join the Crew

📜 The Fine Print on Your Ticket Stub

About

Releases

Packages

Languages

License

muddylemon/ModelTrainSet

Folders and files

Latest commit

History

Repository files navigation

🚂 ModelTrainSet: All Aboard the ML Express! 🚂

🎭 What's This All About?

🎟️ Getting Your Ticket to Ride

🧳 Packing Your Bags (Installation)

🌟 First Class: Conda/Mamba Setup (Recommended)

🛠️ Manual Setup (if you encounter issues)

🚶‍♂️ Economy Class: Pip Setup

🚂 All Aboard! (Usage)

Other Tutorials

🛤️ Extending Your Journey

🆘 Help! I'm Lost

🤝 Join the Crew

📜 The Fine Print on Your Ticket Stub

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages