Skip to content

Commit

Permalink
Added implementation of nunique function (#29)
Browse files Browse the repository at this point in the history
* Added implementation of nunique function

* Added test for handling strings nulls (" "), differentiating behavior between Python and kdb+

* Suggested changes. Error with mixed lists and tests for this case.

* QError for mixed lists (suggested by Kx)

* minor: rename filternan (suggested)
  • Loading branch information
chraberturas authored Jan 22, 2024
1 parent eaea633 commit abcc1b4
Show file tree
Hide file tree
Showing 3 changed files with 84 additions and 1 deletion.
43 changes: 42 additions & 1 deletion docs/user-guide/advanced/Pandas_API.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2535,7 +2535,48 @@
},
{
"cell_type": "markdown",
"id": "499025cb",
"source": [
"### Table.nunique()\n",
"```\n",
"Table.nunique(axis=0, skipna=True, numeric_only=False, min_count=0)\n",
"```\n",
"\n",
"Returns the number of unique elements across the given axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: |:------------------------------------------------------------------------------------| :-----: |\n",
"| axis | int | The axis to calculate the number of unique elements across 0 is columns, 1 is rows. | 0 |\n",
"| dropna | bool | Don’t include NaN in the counts. | True |\n",
"\n",
"**Returns:**\n",
"\n",
" | Type | Description |\n",
" | :----------------: | :------------------------------------------------------------------- |\n",
" | Dictionary | A dictionary where the key represent the column name / row number and the values are the result of calling `nunique` on that column / row. |"
],
"metadata": {
"collapsed": false
},
"id": "5bc5e813e9673a84"
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"tab.nunique()"
],
"metadata": {
"collapsed": false
},
"id": "f5592b19b69ad46d"
},
{
"cell_type": "markdown",
"id": "655c3ad2",

"metadata": {},
"source": [
"## Setting Indexes"
Expand Down
8 changes: 8 additions & 0 deletions src/pykx/pandas_api/pandas_meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,14 @@ def sum(self, axis=0, skipna=True, numeric_only=False, min_count=0):
min_count
), cols)

@convert_result
def nunique(self, axis=0, dropna=True):
res, cols = preparse_computations(self, axis, skipna=False)
filternan = q('{$[all[10h=type each x]|11h = type x;x;'
'x where not null x]}each')
res = filternan(res) if dropna else res
return (q("('[count;distinct]')", res), cols)

def agg(self, func, axis=0, *args, **kwargs): # noqa: C901
if 'KeyedTable' in str(type(self)):
raise NotImplementedError("'agg' method not presently supported for KeyedTable")
Expand Down
34 changes: 34 additions & 0 deletions tests/test_pandas_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -2038,3 +2038,37 @@ def test_keyed_loc_fixes(q):
mkt[['k1', 'y']]
with pytest.raises(KeyError):
mkt['k1']


def test_nunique(kx, q):
tab = kx.q('([]a:4 0n 7 6;b:4 0n 0n 7;c:``foo`foo`)')
df = tab.pd()
p_m = df.nunique()
q_m = tab.nunique()
for c in q.key(q_m).py():
assert p_m[c] == q_m[c].py()
p_m = df.nunique(dropna=False)
q_m = tab.nunique(dropna=False)
for c in q.key(q_m).py():
assert p_m[c] == q_m[c].py()

df = pd.DataFrame(
{
'a': [1, 2, 2, 4],
'b': [1, 2, 6, 7],
'c': [7, 8, 9, 10],
}
)
tab = kx.toq(df)
p_m = df.nunique()
q_m = tab.nunique()
for c in q.key(q_m).py():
assert p_m[c] == q_m[c].py()
p_m = df.nunique(axis=1)
q_m = tab.nunique(axis=1)
for c in range(len(tab)):
assert p_m[c] == q_m[c].py()

tab = kx.q('([]a:("";" ";"";"foo"))')
with pytest.raises(kx.QError):
raise tab.nunique()

0 comments on commit abcc1b4

Please sign in to comment.