Merge pull request #15 from fraunhoferportugal/dev

Minor Patch 0.1.4 - Bugfixes and documentation
fraunhoferportugal · Nov 21, 2024 · b36c471 · b36c471
2 parents 49d8a17 + d64e93a
commit b36c471
Show file tree

Hide file tree

Showing 66 changed files with 469 additions and 2,036 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,27 @@
 All notable changes to this project will be documented in this file.
 This format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.1.4] - 2024-11-21
+Taxonomy rework and documentation updates.
+
+### Added 
+ - readthedocs slug in the README file
+
+### Changed
+ - Renamed `validation_type` to `validation_domain`
+ - Renamed `metric_group` to `metric_category`
+ - Renamed `metric_goal` to `metric_group`
+
+### Fixed
+ - Getting features from the last Linear layer of VGG models
+ - Simplified batch stacking in Image extractor method
+ - Updated hierarchy diagram in the documentation
+ - Using local seed in the `features_splitting` method to avoid global overrides that led to inconsistent results
+ - Removed ununsed text modules
+ - Added seed for the `cluster_into_bins` method in the `PrecisionRecallDistribution` metric. This ensures that the results are consistent across runs
+
+
+
 ## [0.1.3] - 2024-11-05
 Documentation and API updates.
 

diff --git a/README.md b/README.md
@@ -9,13 +9,18 @@
 [![security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)
 [![pytest](https://img.shields.io/badge/pytest-enabled-brightgreen)](https://github.com/pytest-dev/pytest)
 [![conventional-commits](https://img.shields.io/badge/conventional%20commits-1.0.0-yellow)](https://github.com/commitizen-tools/commitizen)
+[![Read The Docs](https://readthedocs.org/projects/pymdma/badge/?version=latest)](https://pymdma.readthedocs.io/en/latest/installation/)
+
+<!-- [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/fraunhoferportugal/pymdma.git/main?labpath=notebooks%2Fimage_examples.ipynb) -->
 
 Data auditing is essential for ensuring the reliability of machine learning models by maintaining the integrity of the datasets upon which these models rely. As synthetic data use increases to address data scarcity and privacy concerns, there is a growing demand for a robust auditing framework.
 
 Existing repositories often lack comprehensive coverage across various modalities or validation types. This work introduces a dedicated library for data auditing, presenting a comprehensive suite of metrics designed for evaluating synthetic data. Additionally, it extends its focus to the quality assessment of input data, whether synthetic or real, across time series, tabular, and image modalities.
 
 This library aims to serve as a unified and accessible resource for researchers, practitioners, and developers, enabling them to assess the quality and utility of their datasets. This initiative encourages collaborative contributions by open-sourcing the associated code, fostering a community-driven approach to advancing data auditing practices. This work is intended for publication in an open-source journal to facilitate widespread dissemination, adoption, and impact tracking within the scientific and technical community.
 
+For more information check out the official documentation [here](https://pymdma.readthedocs.io/en/latest/).
+
 ## Prerequisites
 
 You will need:
@@ -106,13 +111,13 @@ Following is an example of executing the evaluation of a synthetic dataset with
 
 ```bash
 pymdma --modality image \
-    --validation_type synth \
+    --validation_domain synth \
     --reference_type dataset \
     --evaluation_level dataset \
     --reference_data data/test/image/synthesis_val/reference \
     --target_data data/test/image/synthesis_val/dataset \
     --batch_size 3 \
-    --metric_group feature \
+    --metric_category feature \
     --output_dir reports/image_metrics/
 ```
 
@@ -223,4 +228,3 @@ If you publish work that uses pyMDMA, please cite pyMDMA as follows:
 This work was funded by AISym4Med project number 101095387, supported by the European Heath and Digital Executive Agency (HADEA), granting authority under the powers delegated by the Europeam Commision. More information on this project can be found [here](https://aisym4med.eu/).
 
 This work was supported by European funds through the Recovery and Resilience Plan, project ”Center for Responsible AI”, project number C645008882-00000055. Learn more about this project [here](https://centerforresponsible.ai/).
-
diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.1.2
+0.1.4
diff --git a/docs/index.md b/docs/index.md
@@ -11,7 +11,7 @@ The source code is available on [GitHub](https://github.com/fraunhoferportugal/p
 
 ## Metric Categories
 Each metric class is organized based on the modality, validation type, metric group and goal. Following is a brief description of these categories:
-![Metric Categories](resources/data_auditing.png)
+![Metric Categories](resources/pymdma_schema_1.png)
 
 ### Validation Type
 The platform offers two types of evaluation - input and synthesis validation. The first type includes metrics for assessing raw data quality intended for use in machine learning tasks. The second type evaluates data generated by a synthesis model. Note that input metrics can also be used to evaluate the quality of synthetic datasets.

diff --git a/docs/installation.md b/docs/installation.md
@@ -1,34 +1,33 @@
 # Installation
 
-It is recommended to install the package in a virtual environment. To install the package, run the following command:
+PyPI is currently unavailable. To install the package, you can install it directly from the git repository. To do so, run the following command:
 
 ```bash
-$ pip install pymdma
+$ pip install "pymdma @ git+https://github.com/fraunhoferportugal/pymdma.git"
 ```
 
-Depending on the data modality you are working with, you may need to install additional dependencies. We have three groups of denpendencies: `image`, `tabular` and `time_series`. As an example, to work with image data, you will need to run the following command:
+<!-- It is recommended to install the package in a virtual environment. To install the package, run the following command:
 
 ```bash
-$ pip install pymdma[image]
-```
+$ pip install pymdma
+``` -->
 
-You can also install multiple modalities by passing the desired modalities as a comma-separated list. For example, to install both image and tabular modalities, you can run the following command:
+Depending on the data modality you are working with, you may need to install additional dependencies. We have three groups of denpendencies: `image`, `tabular` and `time_series`. As an example, to work with image data, you will need to run the following command:
 
 ```bash
-$ pip install pymdma[image,tabular]
+$ pip install "pymdma[image] @ git+https://github.com/fraunhoferportugal/pymdma.git"
 ```
 
-Or alternatively, you can install all modalities by running the following command:
+You can also install multiple modalities by passing the desired modalities as a comma-separated list. For example, to install both image and tabular modalities, you can run the following command:
 
 ```bash
-$ pip install pymdma[all]
+$ pip install "pymdma[image,tabular] @ git+https://github.com/fraunhoferportugal/pymdma.git"
 ```
 
-
 ## Minimal Version (CPU)
 
 For a minimal installation (without GPU support), you can install the package with CPU version of torch, which will skip the installation of CUDA dependencies. To do so, run the following command:
 
 ```bash
-$ pip install pymdma[...] --find-url=https://download.pytorch.org/whl/cpu/torch_stable.html
+$ pip install pymdma[...] @ git+https://github.com/fraunhoferportugal/pymdma.git --find-url=https://download.pytorch.org/whl/cpu/torch_stable.html
 ```
diff --git a/docs/resources/data_auditing.png b/docs/resources/data_auditing.png
diff --git a/docs/resources/pymdma_schema_1.png b/docs/resources/pymdma_schema_1.png