NanoBPE

NanoBPE is an imitation of micrograd, designed to explore Byte Pair Encoding (BPE) and its potential applications beyond natural language processing, particularly in fields like recommendation systems and supply chain optimization.

Features

Implements a basic Byte Pair Encoding (BPE) algorithm.
Simple and lightweight, inspired by micrograd’s minimalistic approach.
Flexible tokenization for experimenting with different types of data.

Installation

NanoBPE is still under development and hasn't been packaged for PyPI yet. To use it, you'll need to clone the repository and append the project directory to your Python path.

git clone https://github.com/ickma/nanobpe.git

Usage

Since NanoBPE isn’t available as a package, you need to import it manually in your Python environment (e.g., Jupyter Notebook or any Python script):

import sys
sys.path.append('path/to/nanobpe')
from nanobpe.base import BaseTokenizer

Once imported, you can initialize the BaseTokenizer and start experimenting with Byte Pair Encoding in your own applications.

Example

tokenizer = BaseTokenizer()
tokenizer.train("your text here")
tokens = tokenizer.encode("your text here")
print(tokens)

Future work

Future Work

I aim to explore the potential of using a BPE-like algorithm in areas beyond NLP, such as:

Recommendation Systems: Investigating how BPE can be adapted to improve content or product recommendations.
Supply Chain Optimization: Experimenting with subword tokenization in supply chain models, potentially improving demand forecasting or inventory management.

I will continue to develop this repository sporadically, and contributions or suggestions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
nanobpe		nanobpe
.gitignore		.gitignore
README.md		README.md
taloyswift.ipynb		taloyswift.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NanoBPE

Features

Installation

Usage

Example

Future work

About

Releases

Packages

Languages

ickma/nanobpe

Folders and files

Latest commit

History

Repository files navigation

NanoBPE

Features

Installation

Usage

Example

Future work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages