Merge pull request #17 from fraunhoferportugal/dev

fix: rename metric organization names in documentation
fraunhoferportugal · Nov 21, 2024 · 2e0c160 · 2e0c160
2 parents 519db85 + af7c781
commit 2e0c160
Showing 1 changed file with 6 additions and 5 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -9,17 +9,18 @@ specific data modalities or do not include certain state-of-the-art metrics. The
 
 The source code is available on [GitHub](https://github.com/fraunhoferportugal/pymdma/tree/main).
 
-## Metric Categories
-Each metric class is organized based on the modality, validation type, metric group and goal. Following is a brief description of these categories:
+## Metric Organization
 ![Metric Categories](resources/pymdma_schema_1.png)
 
-### Validation Type
+Each metric class is organized based on the modality, validation domain, metric category and group. Following is a brief description of these hierarchies:
+
+### Validation Domain
 The platform offers two types of evaluation - input and synthesis validation. The first type includes metrics for assessing raw data quality intended for use in machine learning tasks. The second type evaluates data generated by a synthesis model. Note that input metrics can also be used to evaluate the quality of synthetic datasets.
 
-### Metric Group
+### Metric Category
 Metrics are loosely organized based on the data format and metric input requirements. Data-based metrics require minimal to no preprocessing of the data before computation. Feature-based metrics are computed over embeddings of the data, often obtained with a classification model. Annotation-based metrics validate the integrity and validity of dataset annotations. Currently, this last type is only available for COCO [1] annotated image datasets.
 
-### Metric Goal
+### Metric Group
 These categories represent the types of evaluations each metric performs and are applicable across various validation contexts. For input validation, Quality refers to measurable data attributes, such as contrast and brightness in images or the signal-to-noise ratio in time-series data. In synthesis validation, Quality encompasses three key evaluation pillars for synthetic datasets: Fidelity, Diversity, and Authenticity [2]. Fidelity measures the similarity of a synthetic dataset to real data; Diversity evaluates how well the synthetic dataset spans the full range of the real data manifold; and Authenticity ensures the synthetic dataset is sufficiently distinct from real data to avoid being a copy.
 
 Utility metrics assess the usefulness of synthetic datasets for downstream tasks, which is especially valuable when synthetic data is used to augment real datasets. Privacy metrics examine whether a dataset or instance is overly similar to another; without a reference, they help identify sensitive attributes like names or addresses. Finally, Validity includes metrics that confirm data integrity, such as ensuring that COCO annotations meet standard formatting requirements.