Skip to content

Commit

Permalink
Merge pull request #17 from fraunhoferportugal/dev
Browse files Browse the repository at this point in the history
fix: rename metric organization names in documentation
  • Loading branch information
ivo-facoco authored Nov 21, 2024
2 parents 519db85 + af7c781 commit 2e0c160
Showing 1 changed file with 6 additions and 5 deletions.
11 changes: 6 additions & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,18 @@ specific data modalities or do not include certain state-of-the-art metrics. The

The source code is available on [GitHub](https://github.com/fraunhoferportugal/pymdma/tree/main).

## Metric Categories
Each metric class is organized based on the modality, validation type, metric group and goal. Following is a brief description of these categories:
## Metric Organization
![Metric Categories](resources/pymdma_schema_1.png)

### Validation Type
Each metric class is organized based on the modality, validation domain, metric category and group. Following is a brief description of these hierarchies:

### Validation Domain
The platform offers two types of evaluation - input and synthesis validation. The first type includes metrics for assessing raw data quality intended for use in machine learning tasks. The second type evaluates data generated by a synthesis model. Note that input metrics can also be used to evaluate the quality of synthetic datasets.

### Metric Group
### Metric Category
Metrics are loosely organized based on the data format and metric input requirements. Data-based metrics require minimal to no preprocessing of the data before computation. Feature-based metrics are computed over embeddings of the data, often obtained with a classification model. Annotation-based metrics validate the integrity and validity of dataset annotations. Currently, this last type is only available for COCO [1] annotated image datasets.

### Metric Goal
### Metric Group
These categories represent the types of evaluations each metric performs and are applicable across various validation contexts. For input validation, Quality refers to measurable data attributes, such as contrast and brightness in images or the signal-to-noise ratio in time-series data. In synthesis validation, Quality encompasses three key evaluation pillars for synthetic datasets: Fidelity, Diversity, and Authenticity [2]. Fidelity measures the similarity of a synthetic dataset to real data; Diversity evaluates how well the synthetic dataset spans the full range of the real data manifold; and Authenticity ensures the synthetic dataset is sufficiently distinct from real data to avoid being a copy.

Utility metrics assess the usefulness of synthetic datasets for downstream tasks, which is especially valuable when synthetic data is used to augment real datasets. Privacy metrics examine whether a dataset or instance is overly similar to another; without a reference, they help identify sensitive attributes like names or addresses. Finally, Validity includes metrics that confirm data integrity, such as ensuring that COCO annotations meet standard formatting requirements.
Expand Down

0 comments on commit 2e0c160

Please sign in to comment.