This is a data science + machine learning library for OCaml, mainly intended for people wishing to learn how to implement machine learning algorithms in a functional programming language, and for use in personal projects. It combines the most commonly used functionality of the Python data science libraries Numpy, Pandas and Sklearn. The following features are offered:
- Jupyter frontend for OCaml
- Load CSV files
- Matrix operations, from basic arithmetic to everything including row reduction, inverse, transpose, eigenvectors, determinants.
- Descriptive statistics for dataset, dataframe slicing and broadcasting.
- Simple syntax to split up a dataset into training, validation and test sets, apply a prewritten model, and evaluate the accuracy of the model on the dataset.
- The following machine learning algorithms: Logistic Regression, Polynomial Regression, K Nearest Neighbors, K Means Clustering, Naive Bayes, Decision Trees, Perceptron.
- A command line utility that allows users not proficient in OCaml programming to still use some common functionality, including basic data manipulation and application of models.
The installation of this library, along with the setting up the Jupyter kernel is described in install.txt
.
Several make
commands can be run in the home directory:
make build
- Compile the codemake test
- Run the test suitemake docs
- Generateocamldoc
documentation. We recommend looking forindex.html
inside the_doc.private
folder that gets generated inside the home directory. Using these to look for what parameters the different algorithms will take for fitting and predicting will be very helpful.make ui
- Run the command line interface that allows for simple data manipulation and applying a machine learning algorithm for people not adept at OCaml programming.
Several demos in the form of prewritten Jupyter notebooks are offered, the following list contains some of the most useful ones:
dataframe.ipynb
- Data loading and manipulation examplesmatrix.ipynb
- Matrix operations examplesstatistics.ipynb
- Statistics operations examples<machine_learning_algo>.ipynb
- Examples on the usage of different machine learning algorithms is showcased, along with a visual representation generated using theArchimedes
library.