Bugfix/935/match pandas behviour when aggregating columns with nans #1450

alexowens90 · 2024-03-21T09:59:13Z

Fixes #935

Also addresses two of the bullet points from #1439:

AggregationClause::process x2
aggregation.cpp all finalize methods

…eProcess

…rocess

willdealtry · 2024-03-21T10:44:42Z

cpp/arcticdb/processing/aggregation.cpp

-                    auto out_ptr = reinterpret_cast<RawType*>(col->ptr());
-                    for(auto i = 0u; i < unique_values; ++i, ++in_ptr, ++out_ptr) {
-                        *out_ptr = in_ptr->value_;
+                    for (auto it = column_data.begin<dynamic_schema_tdt>(); it != column_data.end<dynamic_schema_tdt>(); ++it, ++in_ptr) {


Don't hugely mind but snake case type is not the normal pattern. Is there a reason why this diverges?

Nope, will change

DrNickClarke · 2024-03-21T10:56:31Z

python/tests/unit/arcticdb/version_store/test_aggregation.py

@@ -36,6 +36,29 @@ def test_group_on_float_column_with_nans(lmdb_version_store):
    assert_frame_equal(expected, received)


+# TODO: Add first and last once un-feature flagged
+@pytest.mark.parametrize("aggregator", ("sum", "min", "max", "mean", "count"))
+def test_aggregate_float_columns_with_nans(lmdb_version_store, aggregator):


The test looks correct to me

…1450) Fixes #935 Also addresses two of the bullet points from #1439: - AggregationClause::process x2 - aggregation.cpp all finalize methods

alexowens90 added 11 commits March 20, 2024 14:18

Added failing test

e7a25bc

Fix mean

9d256ae

Tweaked count

be1ce8f

Fix sum

9862b63

Added comment

b4f2277

Use std::transform with ColumnDataIterator at end of AggregationClaus…

5992f77

…eProcess

Use Column::for_each_enumerated when grouping in AggregationClause::p…

c3bfac7

…rocess

Remove superseded test

f862683

Use more ColumnDataIterators

8f8e028

Use more ColumnDataIterators

4662bf9

More tidy up

9aa0702

willdealtry reviewed Mar 21, 2024

View reviewed changes

DrNickClarke approved these changes Mar 21, 2024

View reviewed changes

Correct TDT tag case

93cd2f3

alexowens90 merged commit ed42b00 into master Mar 21, 2024
114 checks passed

alexowens90 deleted the bugfix/935/match-pandas-behviour-when-aggregating-columns-with-nans branch March 21, 2024 15:38

alexowens90 mentioned this pull request Mar 21, 2024

Replace raw pointer buffer access with static Column for_each/transform methods or ColumnDataIterator where possible #1439

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/935/match pandas behviour when aggregating columns with nans #1450

Bugfix/935/match pandas behviour when aggregating columns with nans #1450

alexowens90 commented Mar 21, 2024

willdealtry Mar 21, 2024

alexowens90 Mar 21, 2024

DrNickClarke Mar 21, 2024

Bugfix/935/match pandas behviour when aggregating columns with nans #1450

Bugfix/935/match pandas behviour when aggregating columns with nans #1450

Conversation

alexowens90 commented Mar 21, 2024

willdealtry Mar 21, 2024

Choose a reason for hiding this comment

alexowens90 Mar 21, 2024

Choose a reason for hiding this comment

DrNickClarke Mar 21, 2024

Choose a reason for hiding this comment