A set of Data analysis tools in pYTHON 3.x.
Clone this repository to your local machine and run pip
:
git clone https://github.com/shakedzy/dython.git
cd dython
pip install .
Dependencies: numpy
, pandas
, seaborn
, scipy
, matplotlib
, sklearn
A set of functions to explore nominal (categorical) datasets and mixed (nominal and continuous) data-sets.
Coefficients and statistics:
- Conditional entropy (
conditional_entropy
) - Cramer's V (
cramers_v
) - Theil's U (
theils_u
) - Correlation ratio (
correlation_ratio
)
Additional functions:
associations
: Calculate correlation/strength-of-association of a data-setnumerical_encoding
: Encode a mixed data-set to a numerical data-set (one-hot encoding)
A set of functions to gain more information over a model's performance.
roc_graph
: compute and plot a ROC graph (and AUC score) for a model's predictionsrandom_forest_feature_importance
: plot the feature importance of a trained sklearnRandomForestClassifier
associations
: Calculate correlation/strength-of-association of a data-set (same asnominal.associations
)
See the examples.py
module for roc_graph
and associations
examples.
Read more about the Nominal tools on The Search for Categorical Correlation
Apache License 2.0