Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added implementation of nunique function #29

Merged
merged 10 commits into from
Jan 22, 2024
43 changes: 42 additions & 1 deletion docs/user-guide/advanced/Pandas_API.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2509,7 +2509,48 @@
},
{
"cell_type": "markdown",
"id": "499025cb",
chraberturas marked this conversation as resolved.
Show resolved Hide resolved
"source": [
"### Table.nunique()\n",
"```\n",
"Table.nunique(axis=0, skipna=True, numeric_only=False, min_count=0)\n",
"```\n",
"\n",
"Returns the number of unique elements across the given axis.\n",
"\n",
"**Parameters:**\n",
"\n",
"| Name | Type | Description | Default |\n",
"| :----------: | :--: |:------------------------------------------------------------------------------------| :-----: |\n",
"| axis | int | The axis to calculate the number of unique elements across 0 is columns, 1 is rows. | 0 |\n",
"| dropna | bool | Don’t include NaN in the counts. | True |\n",
"\n",
"**Returns:**\n",
"\n",
" | Type | Description |\n",
" | :----------------: | :------------------------------------------------------------------- |\n",
" | Dictionary | A dictionary where the key represent the column name / row number and the values are the result of calling `nunique` on that column / row. |"
],
"metadata": {
"collapsed": false
},
"id": "5bc5e813e9673a84"
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"tab.nunique()"
],
"metadata": {
"collapsed": false
},
"id": "f5592b19b69ad46d"
},
{
"cell_type": "markdown",
"id": "655c3ad2",

"metadata": {},
"source": [
"## Setting Indexes"
Expand Down
10 changes: 10 additions & 0 deletions src/pykx/pandas_api/pandas_meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,16 @@ def sum(self, axis=0, skipna=True, numeric_only=False, min_count=0):
min_count
), cols)

@convert_result
def nunique(self, axis=0, dropna=True):
res, cols = preparse_computations(self, axis, skipna=False)
if q("any('[1<>count distinct@;type']')@", res).py():
raise NotImplementedError("Table contains a column whose type is mixed")
filterNan = q('{$[all[10h=type each x]|11h = type x;x;'
chraberturas marked this conversation as resolved.
Show resolved Hide resolved
'x where not null x]}each')
res = filterNan(res) if dropna else res
return (q("('[count;distinct]')", res), cols)

def agg(self, func, axis=0, *args, **kwargs): # noqa: C901
if 'KeyedTable' in str(type(self)):
raise NotImplementedError("'agg' method not presently supported for KeyedTable")
Expand Down
35 changes: 35 additions & 0 deletions tests/test_pandas_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -2029,3 +2029,38 @@ def test_keyed_loc_fixes(q):
mkt[['k1', 'y']]
with pytest.raises(KeyError):
mkt['k1']


def test_nunique(kx, q):
tab = kx.q('([]a:4 0n 7 6;b:4 0n 0n 7;c:``foo`foo`)')
df = tab.pd()
p_m = df.nunique()
q_m = tab.nunique()
for c in q.key(q_m).py():
assert p_m[c] == q_m[c].py()
p_m = df.nunique(dropna=False)
q_m = tab.nunique(dropna=False)
for c in q.key(q_m).py():
assert p_m[c] == q_m[c].py()

df = pd.DataFrame(
{
'a': [1, 2, 2, 4],
'b': [1, 2, 6, 7],
'c': [7, 8, 9, 10],
}
)
tab = kx.toq(df)
p_m = df.nunique()
q_m = tab.nunique()
for c in q.key(q_m).py():
assert p_m[c] == q_m[c].py()
p_m = df.nunique(axis=1)
q_m = tab.nunique(axis=1)
for c in range(len(tab)):
assert p_m[c] == q_m[c].py()

tab = kx.q('([]a:("";" ";"";"foo"))')
with pytest.raises(NotImplementedError,
match=r"Table contains a column whose type is mixed"):
raise tab.nunique()