diff --git a/Makefile b/Makefile index 58c6629..098d057 100644 --- a/Makefile +++ b/Makefile @@ -1,20 +1,21 @@ -# Run unit tests test: - python3 -m unittest discover -s tests > /dev/null + @echo "Running unit tests..." + @python3 -m unittest discover -s tests > /dev/null -# Check test coverage using unittest module coverage: - coverage run --source=cce -m unittest discover -s tests > /dev/null; coverage report + @echo "Checking test coverage using unittest module..." + @coverage run --source=cce -m unittest discover -s tests > /dev/null; coverage report -# Check test coverage and show the results in browser html: - coverage run --source=cce -m unittest discover -s tests > /dev/null; coverage html; python -m webbrowser "./htmlcov/index.html" & + @echo "Checking test coverage and showing the results in browser..." + @coverage run --source=cce -m unittest discover -s tests > /dev/null; coverage html; python -m webbrowser "./htmlcov/index.html" & -# Check compatibility with Python 2.7 comp: - python2 -m unittest discover -s tests > /dev/null + @echo "Checking compatibility with Python 2.7..." + @python2 -m unittest discover -s tests > /dev/null install: - pip install . + @echo "Installing cce module via pip..." + @pip install . -.PHONY: test +.PHONY: test coverage html comp install diff --git a/Readme.rst b/Readme.rst index 925a26c..f7cda4f 100644 --- a/Readme.rst +++ b/Readme.rst @@ -4,8 +4,8 @@ Channel Capacity Estimator Channel Capacity Estimator (**cce**) is a python module to estimate `information capacity`_ of a communication channel. Mutual information, -computed as proposed by `Kraskov et al.` (*Physical Review E*, 2004) -Eq. (8), is maximized over input probabilities by means of a constrained +computed as proposed by `Kraskov et al.`_ (*Physical Review E*, 2004, +Eq. (8)), is maximized over input probabilities by means of a constrained gradient-based stochastic optimization. The only parameter of the Kraskov algorithm is the number of neighbors, *k*, used in the nearest neighbor search. In **cce**, channel input is expected to be of categorical type @@ -19,8 +19,9 @@ requirements.txt for a complete list of dependencies. Module **cce** features the research article "Limits to the rate of information transmission through MAPK pathway" by Grabowski *et al.*, -submitted to *PLOS Computational Biology* in 2018. Release 0.4 of the -code has been included as supplementary data of this article. +submitted to *PLOS Computational Biology* (2018). Version 1.0 of **cce** +(with pre-built documentation) has been included as supplementary code +of this article. For any updates and fixes to **cce**, please visit project homepage: http://pmbm.ippt.pan.pl/software/cce @@ -38,7 +39,7 @@ There are three major use cases of **cce**: In the example below, mutual information is calculated between three sets of points drawn at random from two-dimensional Gaussian distributions, -located at (0,0), (1,1), and at (3,3) (in SciPy, covariance matrices of +located at (0,0), (0,1), and at (3,3) (in SciPy, covariance matrices of all three distributions by default are identity matrices). Auxiliary function `label_all_with` helps to prepare the list of all points, in which each point is labeled according to its distribution of origin. @@ -51,15 +52,16 @@ which each point is labeled according to its distribution of origin. >>> def label_all_with(label, values): return [(label, v) for v in values] >>> >>> data = label_all_with('A', mvn(mean=(0,0)).rvs(10000)) \ - + label_all_with('B', mvn(mean=(1,1)).rvs(10000)) \ + + label_all_with('B', mvn(mean=(0,1)).rvs(10000)) \ + label_all_with('C', mvn(mean=(3,3)).rvs(10000)) >>> - >>> wke(data).calculate_mi(k=50) - 0.9386627422798913 + >>> wke(data).calculate_mi(k=10) + 0.9552107248613955 In this example, probabilities of input distributions, henceforth referred to as *weights*, are assumed to be equal for all input distributions. Format -of data is akin to [('A', array([-0.4, 2.8])), ('A', array([-0.9, -0.1])), ..., ('B', array([1.7, 0.9])), ..., ('C', array([3.2, 3.3])), ...). +of data is akin to [('A', array([-0.4, 2.8])), ('A', array([-0.9, -0.1])), +..., ('B', array([1.7, 0.9])), ..., ('C', array([3.2, 3.3])), ...). Entries of data are not required to be grouped according to the label. Distribution labels can be given as strings, not just single characters. Instead of NumPy arrays, ordinary lists with coordinates will be also @@ -81,12 +83,12 @@ input distributions: >>> def label_all_with(label, values): return [(label, v) for v in values] >>> >>> data = label_all_with('A', mvn(mean=(0,0)).rvs(10000)) \ - + label_all_with('B', mvn(mean=(1,1)).rvs(10000)) \ + + label_all_with('B', mvn(mean=(0,1)).rvs(10000)) \ + label_all_with('C', mvn(mean=(3,3)).rvs(10000)) >>> - >>> weights = {'A': 3/6, 'B': 1/6, 'C': 2/6} - >>> wke(data).calculate_weighted_mi(weights=weights, k=50) - 0.9420502318804324 + >>> weights = {'A': 2/6, 'B': 1/6, 'C': 3/6} + >>> wke(data).calculate_weighted_mi(weights=weights, k=10) + 1.0065891280377155 (This example involves random numbers, so your result may vary slightly.) @@ -102,11 +104,11 @@ input distributions: >>> def label_all_with(label, values): return [(label, v) for v in values] >>> >>> data = label_all_with('A', mvn(mean=(0,0)).rvs(10000)) \ - + label_all_with('B', mvn(mean=(1,1)).rvs(10000)) \ + + label_all_with('B', mvn(mean=(0,1)).rvs(10000)) \ + label_all_with('C', mvn(mean=(3,3)).rvs(10000)) >>> - >>> wke(data).calculate_maximized_mi(k=50) - (0.98616722147976, {'A': 0.38123083, 'B': 0.16443817, 'C': 0.45433092}) + >>> wke(data).calculate_maximized_mi(k=10) + (1.0154510500713743, {'A': 0.33343804, 'B': 0.19158363, 'C': 0.4749783}) The output tuple contains the maximized mutual information (channel capacity) and probabilities of input distributions that maximize mutual information (argmax). @@ -114,14 +116,26 @@ Optimization is performed within TensorFlow with multiple threads and takes less than a minute on a quad-core processor. (This example involves random numbers, so your result may vary slightly.) + Testing ------- -To launch a suite of unit tests run: +To launch a suite of unit tests, run: .. code:: bash $ make test + +Documentation +------------- +Developer's code documentation may be generated with + +.. code:: bash + + $ cd docs + $ make html + + Installation ------------ To install **cce** locally via pip, run: @@ -139,8 +153,6 @@ Then, you can directly start using the package: >>> ... - - Authors ------- @@ -148,7 +160,7 @@ The code was developed by `Frederic Grabowski`_ and `Paweł Czyż`_, with some guidance from `Marek Kochańczyk`_ and under supervision of `Tomasz Lipniacki`_ from the `Laboratory of Modeling in Biology and Medicine`_, `Institute of Fundamental Technological Reasearch, Polish Academy of Sciences`_ -in Warsaw. +(IPPT PAN) in Warsaw. License diff --git a/cce/estimator.py b/cce/estimator.py index 9c300ce..3ab4449 100644 --- a/cce/estimator.py +++ b/cce/estimator.py @@ -8,9 +8,9 @@ from scipy.spatial import cKDTree from scipy.special import digamma import numpy as np -from cce.preprocess import normalize, add_noise_if_duplicates -from cce.optimize import weight_optimizer -from cce.score import weight_loss +from cce.preprocessing import normalize, add_noise_if_duplicates +from cce.optimization import weight_optimizer +from cce.scoring import weight_loss class WeightedKraskovEstimator: @@ -39,7 +39,7 @@ def __init__(self, data: list = None, leaf_size: int = 16): self._number_of_points_for_label = defaultdict(lambda: 0) self._number_of_labels = None - # Immersed data -- X is mapped from cateogorical data into reals + # Immersed data -- X is mapped from categorical data into reals # using _huge_dist, and we store spaces X x Y and just Y. self._immersed_data_full = None self._immersed_data_coordinates = None @@ -226,7 +226,7 @@ def calculate_weighted_mi(self, weights: dict, k: int) -> float: def optimize_weights(self) -> tuple: - """Function optimizing weights using weight_optimizer. + """Optimizes probabilities of input distributions (weights). Returns ------- @@ -281,7 +281,7 @@ def _turn_into_neigh_list(self, indices, special_point_label): def calculate_neighborhoods(self, k: int): - """Function that prepares neighborhood_array. + """Prepares neighborhood_array. Parameters ---------- diff --git a/cce/optimize.py b/cce/optimization.py similarity index 100% rename from cce/optimize.py rename to cce/optimization.py diff --git a/cce/preprocess.py b/cce/preprocessing.py similarity index 93% rename from cce/preprocess.py rename to cce/preprocessing.py index 70aca03..39b5aa0 100644 --- a/cce/preprocess.py +++ b/cce/preprocessing.py @@ -16,7 +16,7 @@ def _project_coords(data: list) -> list: def normalize(data: list) -> list: - """Perform input data normalization + """Performs input data normalization. Parameters ---------- @@ -44,7 +44,7 @@ def normalize(data: list) -> list: def unique(arr) -> bool: - """Check if all points in the array of coordinates are unique. + """Checks if all points in the array of coordinates are unique. Parameters ---------- @@ -60,7 +60,7 @@ def unique(arr) -> bool: def add_noise_if_duplicates(data: list) -> list: - """Add noise to input data + """Adds noise to input data. Parameters ---------- diff --git a/cce/score.py b/cce/scoring.py similarity index 100% rename from cce/score.py rename to cce/scoring.py diff --git a/docs/Makefile b/docs/Makefile index f642b30..3602e90 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -3,7 +3,7 @@ # You can set these variables from the command line. SPHINXOPTS = -SPHINXBUILD = python -msphinx +SPHINXBUILD = python3 -msphinx SPHINXPROJ = ChannelCapacityEstimator SOURCEDIR = . BUILDDIR = _build diff --git a/docs/conf.py b/docs/conf.py index 26c3544..d6ac4ef 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -66,7 +66,7 @@ # The short X.Y version. version = '1.0' # The full version, including alpha/beta/rc tags. -release = '1.0' +release = '1.0.0' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/docs/source/cce.rst b/docs/source/cce.rst index 777d6d5..10fa49e 100644 --- a/docs/source/cce.rst +++ b/docs/source/cce.rst @@ -4,18 +4,18 @@ cce package Submodules ---------- -cce\.score module ------------------ +cce\.scoring module +------------------- -.. automodule:: cce.score +.. automodule:: cce.scoring :members: :undoc-members: :show-inheritance: -cce\.preprocess module ----------------------- +cce\.preprocessing module +------------------------- -.. automodule:: cce.preprocess +.. automodule:: cce.preprocessing :members: :undoc-members: :show-inheritance: @@ -28,10 +28,10 @@ cce\.estimator module :undoc-members: :show-inheritance: -cce\.optimize module --------------------- +cce\.optimization module +------------------------ -.. automodule:: cce.optimize +.. automodule:: cce.optimization :members: :undoc-members: :show-inheritance: diff --git a/tests/test_preprocess.py b/tests/test_preprocess.py index 1bf8646..aab621c 100644 --- a/tests/test_preprocess.py +++ b/tests/test_preprocess.py @@ -1,5 +1,5 @@ import unittest -from cce.preprocess import normalize +from cce.preprocessing import normalize LARGE_VALUES_SMALL_SPREAD = [('1', [1e9, 1e9]), ('1', [1e9+1, 1e9+1]),