From c647f4e62fe70d163257e2b1a440d6413d958fde Mon Sep 17 00:00:00 2001 From: CyrilJl Date: Wed, 26 Jun 2024 23:02:04 +0200 Subject: [PATCH] merge example --- README.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/README.md b/README.md index 96d3fe8..3aaddb7 100644 --- a/README.md +++ b/README.md @@ -133,6 +133,29 @@ test_mean(data, n_batches) >>> True ``` +## Merging Two Objects + +In some cases, it is useful to process two different `BatchStats` objects from asynchronous I/O functions and then merge the statistics of both objects at the end. The `batchstats` library supports this functionality by allowing the simple addition of two objects. Under the hood, the necessary computations are performed to produce a resulting statistic that reflects the data from both input datasets, even imbalanced: + +```python +import numpy as np +from batchstats import BatchCov + +data = np.random.randn(25_000, 50) +data1 = data[:10_000] +data2 = data[10_000:] + +cov = BatchCov().update_batch(data) +cov1 = BatchCov().update_batch(data1) +cov2 = BatchCov().update_batch(data2) + +cov_merged = cov1 + cov2 +np.allclose(cov(), cov_merged()) +>>> True +``` + +The `__add__` method has been specifically overloaded to facilitate the merging of statistical objects in `batchstats`, including `BatchCov`, `BatchMax`, `BatchMean`, `BatchMin`, `BatchPeakToPeak`, `BatchStd`, `BatchSum`, and `BatchVar`. + ## Performance In addition to result accuracy, much attention has been given to computation times and memory usage. Fun fact, calculating the variance using `batchstats` consumes little RAM while being faster than `numpy.var`: