Skip to content

Commit

Permalink
Merge pull request #15 from fraunhoferportugal/dev
Browse files Browse the repository at this point in the history
Minor Patch 0.1.4 - Bugfixes and documentation
  • Loading branch information
ivo-facoco authored Nov 21, 2024
2 parents 49d8a17 + d64e93a commit b36c471
Show file tree
Hide file tree
Showing 66 changed files with 469 additions and 2,036 deletions.
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,27 @@
All notable changes to this project will be documented in this file.
This format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.4] - 2024-11-21
Taxonomy rework and documentation updates.

### Added
- readthedocs slug in the README file

### Changed
- Renamed `validation_type` to `validation_domain`
- Renamed `metric_group` to `metric_category`
- Renamed `metric_goal` to `metric_group`

### Fixed
- Getting features from the last Linear layer of VGG models
- Simplified batch stacking in Image extractor method
- Updated hierarchy diagram in the documentation
- Using local seed in the `features_splitting` method to avoid global overrides that led to inconsistent results
- Removed ununsed text modules
- Added seed for the `cluster_into_bins` method in the `PrecisionRecallDistribution` metric. This ensures that the results are consistent across runs



## [0.1.3] - 2024-11-05
Documentation and API updates.

Expand Down
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,18 @@
[![security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)
[![pytest](https://img.shields.io/badge/pytest-enabled-brightgreen)](https://github.com/pytest-dev/pytest)
[![conventional-commits](https://img.shields.io/badge/conventional%20commits-1.0.0-yellow)](https://github.com/commitizen-tools/commitizen)
[![Read The Docs](https://readthedocs.org/projects/pymdma/badge/?version=latest)](https://pymdma.readthedocs.io/en/latest/installation/)

<!-- [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/fraunhoferportugal/pymdma.git/main?labpath=notebooks%2Fimage_examples.ipynb) -->

Data auditing is essential for ensuring the reliability of machine learning models by maintaining the integrity of the datasets upon which these models rely. As synthetic data use increases to address data scarcity and privacy concerns, there is a growing demand for a robust auditing framework.

Existing repositories often lack comprehensive coverage across various modalities or validation types. This work introduces a dedicated library for data auditing, presenting a comprehensive suite of metrics designed for evaluating synthetic data. Additionally, it extends its focus to the quality assessment of input data, whether synthetic or real, across time series, tabular, and image modalities.

This library aims to serve as a unified and accessible resource for researchers, practitioners, and developers, enabling them to assess the quality and utility of their datasets. This initiative encourages collaborative contributions by open-sourcing the associated code, fostering a community-driven approach to advancing data auditing practices. This work is intended for publication in an open-source journal to facilitate widespread dissemination, adoption, and impact tracking within the scientific and technical community.

For more information check out the official documentation [here](https://pymdma.readthedocs.io/en/latest/).

## Prerequisites

You will need:
Expand Down Expand Up @@ -106,13 +111,13 @@ Following is an example of executing the evaluation of a synthetic dataset with

```bash
pymdma --modality image \
--validation_type synth \
--validation_domain synth \
    --reference_type dataset \
--evaluation_level dataset \
    --reference_data data/test/image/synthesis_val/reference \
--target_data data/test/image/synthesis_val/dataset \
    --batch_size 3 \
--metric_group feature \
--metric_category feature \
    --output_dir reports/image_metrics/
```

Expand Down Expand Up @@ -223,4 +228,3 @@ If you publish work that uses pyMDMA, please cite pyMDMA as follows:
This work was funded by AISym4Med project number 101095387, supported by the European Heath and Digital Executive Agency (HADEA), granting authority under the powers delegated by the Europeam Commision. More information on this project can be found [here](https://aisym4med.eu/).

This work was supported by European funds through the Recovery and Resilience Plan, project ”Center for Responsible AI”, project number C645008882-00000055. Learn more about this project [here](https://centerforresponsible.ai/).

2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.2
0.1.4
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The source code is available on [GitHub](https://github.com/fraunhoferportugal/p

## Metric Categories
Each metric class is organized based on the modality, validation type, metric group and goal. Following is a brief description of these categories:
![Metric Categories](resources/data_auditing.png)
![Metric Categories](resources/pymdma_schema_1.png)

### Validation Type
The platform offers two types of evaluation - input and synthesis validation. The first type includes metrics for assessing raw data quality intended for use in machine learning tasks. The second type evaluates data generated by a synthesis model. Note that input metrics can also be used to evaluate the quality of synthetic datasets.
Expand Down
21 changes: 10 additions & 11 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,33 @@
# Installation

It is recommended to install the package in a virtual environment. To install the package, run the following command:
PyPI is currently unavailable. To install the package, you can install it directly from the git repository. To do so, run the following command:

```bash
$ pip install pymdma
$ pip install "pymdma @ git+https://github.com/fraunhoferportugal/pymdma.git"
```

Depending on the data modality you are working with, you may need to install additional dependencies. We have three groups of denpendencies: `image`, `tabular` and `time_series`. As an example, to work with image data, you will need to run the following command:
<!-- It is recommended to install the package in a virtual environment. To install the package, run the following command:
```bash
$ pip install pymdma[image]
```
$ pip install pymdma
``` -->

You can also install multiple modalities by passing the desired modalities as a comma-separated list. For example, to install both image and tabular modalities, you can run the following command:
Depending on the data modality you are working with, you may need to install additional dependencies. We have three groups of denpendencies: `image`, `tabular` and `time_series`. As an example, to work with image data, you will need to run the following command:

```bash
$ pip install pymdma[image,tabular]
$ pip install "pymdma[image] @ git+https://github.com/fraunhoferportugal/pymdma.git"
```

Or alternatively, you can install all modalities by running the following command:
You can also install multiple modalities by passing the desired modalities as a comma-separated list. For example, to install both image and tabular modalities, you can run the following command:

```bash
$ pip install pymdma[all]
$ pip install "pymdma[image,tabular] @ git+https://github.com/fraunhoferportugal/pymdma.git"
```


## Minimal Version (CPU)

For a minimal installation (without GPU support), you can install the package with CPU version of torch, which will skip the installation of CUDA dependencies. To do so, run the following command:

```bash
$ pip install pymdma[...] --find-url=https://download.pytorch.org/whl/cpu/torch_stable.html
$ pip install pymdma[...] @ git+https://github.com/fraunhoferportugal/pymdma.git --find-url=https://download.pytorch.org/whl/cpu/torch_stable.html
```
Binary file removed docs/resources/data_auditing.png
Binary file not shown.
Binary file added docs/resources/pymdma_schema_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit b36c471

Please sign in to comment.