Skip to content

Commit

Permalink
feat: Add collate helper for custom sort orders.
Browse files Browse the repository at this point in the history
  • Loading branch information
jheer committed Sep 25, 2024
1 parent a07c34f commit f98e2f9
Show file tree
Hide file tree
Showing 7 changed files with 159 additions and 11 deletions.
22 changes: 21 additions & 1 deletion docs/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ title: Arquero API Reference
* [load](#load), [loadArrow](#loadArrow), [loadCSV](#loadCSV), [loadFixed](#loadFixed), [loadJSON](#loadJSON)
* [Expression Helpers](#expression-helpers)
* [op](#op), [agg](#agg), [escape](#escape)
* [bin](#bin), [desc](#desc), [frac](#frac), [rolling](#rolling), [seed](#seed)
* [bin](#bin), [collate](#collate), [desc](#desc), [frac](#frac), [rolling](#rolling), [seed](#seed)
* [Selection Helpers](#selection-helpers)
* [all](#all), [not](#not), [range](#range)
* [matches](#matches), [startswith](#startswith), [endswith](#endswith)
Expand Down Expand Up @@ -491,6 +491,26 @@ Generate a table expression that performs uniform binning of number values. The
aq.bin('colA', { maxbins: 20 })
```
<hr/><a id="collate" href="#collate">#</a>
<em>aq</em>.<b>collate</b>(<i>expr</i>, <i>comparator</i>[, <i>options</i>]) · [Source](https://github.com/uwdata/arquero/blob/master/src/helpers/collate.js)
Annotate a table expression with collation metadata, indicating how expression values should be compared and sorted. The [orderby](verbs#orderby) verb uses collation metadata to determine sort order. The collate helper is particularly useful for locale-specific string comparisons. The collation information can either take the form a standard two-argument comparator function, or as locale and option arguments compatible with [`Intl.Collator`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Collator).
* *expr*: The table expression to annotate with collation metadata.
* *comparator*: A comparator function or the locale(s) to use. For locales, both string (e.g., `'de'`, `'tr'`, etc.) and [`Intl.Locale`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Locale) objects (or an array with either) is supported.
* *options*: Collation options compatible with [`Intl.Collator`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Collator). This argument only applies if locales are provided as the second argument.
*Examples*
```js
// order colA using a German locale
aq.collate('colA', 'de')
```
```js
// order colA using a provided comparator function
aq.collate('colA', new Intl.Collator('de').compare)
```
<hr/><a id="desc" href="#desc">#</a>
<em>aq</em>.<b>desc</b>(<i>expr</i>) · [Source](https://github.com/uwdata/arquero/blob/master/src/helpers/desc.js)
Expand Down
14 changes: 12 additions & 2 deletions docs/api/verbs.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ table.ungroup()

Order table rows based on a set of column values. Subsequent operations sensitive to ordering (such as window functions) will operate over sorted values. The resulting table provides an view over the original data, without any copying. To create a table with sorted data copied to new data strucures, call [reify](#reify) on the result of this method. To undo ordering, use [unorder](#unorder).

* *keys*: Key values to sort by, in precedence order. By default, sorting is done in ascending order. To sort in descending order, wrap values using [desc](./#desc). If a string, order by the column with that name. If a number, order by the column with that index. If a function, must be a valid table expression; aggregate functions are permitted, but window functions are not. If an object, object values must be valid values parameters with output column names for keys and table expressions for values (the output names will be ignored). If an array, array values must be valid key parameters.
* *keys*: Key values to sort by, in precedence order. By default, sorting is done in ascending order. To sort in descending order, wrap values using [desc](./#desc). To provide a custom sort order for a key (such as for locale-specific string comparison), wrap the key value using [collate](./#collate). If a key is a string, order by the column with that name. If a number, order by the column with that index. If a function, the key must be a valid table expression; aggregate functions are permitted, but window functions are not. If an object, object values must be valid values parameters with output column names for keys and table expressions for values (the output names will be ignored). If an array, array values must be valid key parameters.

*Examples*

Expand All @@ -135,9 +135,19 @@ table.orderby('a', aq.desc('b'))
table.orderby({ a: 'a', b: aq.desc('b') )})
```

```js
// order by column 'a' according to German locale settings
table.orderby(aq.collate('a', 'de'))
```

```js
// orderby accepts table expressions as well as column names
table.orderby(aq.desc(d => d.a))
table.orderby(d => d.a)
```

```js
// the configurations above can be combined
table.orderby(aq.desc(aq.collate(d => d.a, 'de')))
```

<hr/><a id="unorder" href="#unorder">#</a>
Expand Down
1 change: 1 addition & 0 deletions src/api.js
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ export { default as toJSON } from './format/to-json.js';
export { default as toMarkdown } from './format/to-markdown.js';
export { default as bin } from './helpers/bin.js';
export { default as escape } from './helpers/escape.js';
export { default as collate } from './helpers/collate.js';
export { default as desc } from './helpers/desc.js';
export { default as field } from './helpers/field.js';
export { default as frac } from './helpers/frac.js';
Expand Down
17 changes: 10 additions & 7 deletions src/expression/compare.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,8 @@ import parse from './parse.js';
import { aggregate } from '../verbs/reduce/util.js';

// generate code to compare a single field
const _compare = (u, v, lt, gt) =>
`((u = ${u}) < (v = ${v}) || u == null) && v != null ? ${lt}
: (u > v || v == null) && u != null ? ${gt}
: ((v = v instanceof Date ? +v : v), (u = u instanceof Date ? +u : u)) !== u && v === v ? ${lt}
: v !== v && u === u ? ${gt} : `;
const _compare = (u, v, lt, gt) => `((u = ${u}) < (v = ${v}) || u == null) && v != null ? ${lt} : (u > v || v == null) && u != null ? ${gt} : ((v = v instanceof Date ? +v : v), (u = u instanceof Date ? +u : u)) !== u && v === v ? ${lt} : v !== v && u === u ? ${gt} : `;
const _collate = (u, v, lt, gt, f) => `(v = ${v}, (u = ${u}) == null && v == null) ? 0 : v == null ? ${gt} : u == null ? ${lt} : (u = ${f}(u,v)) ? u : `;

export default function(table, fields) {
// parse expressions, generate code for both a and b values
Expand Down Expand Up @@ -50,9 +47,15 @@ export default function(table, fields) {
+ (op && table.isGrouped() ? 'const ka = keys[a], kb = keys[b];' : '')
+ 'let u, v; return ';
for (let i = 0; i < n; ++i) {
const o = fields.get(names[i]).desc ? -1 : 1;
const field = fields.get(names[i]);
const o = field.desc ? -1 : 1;
const [u, v] = exprs[i];
code += _compare(u, v, -o, o);
if (field.collate) {
code += _collate(u, v, -o, o, `${o < 0 ? '-' : ''}fn[${fn.length}]`);
fn.push(field.collate);
} else {
code += _compare(u, v, -o, o);
}
}
code += '0;};';

Expand Down
25 changes: 25 additions & 0 deletions src/helpers/collate.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import isFunction from '../util/is-function.js';
import wrap from './wrap.js';

/**
* Annotate a table expression with collation metadata, indicating how
* expression values should be compared and sorted. The orderby verb uses
* collation metadata to determine sort order. The collation information can
* either take the form a standard two-argument comparator function, or as
* locale and option arguments compatible with `Intl.Collator`.
* @param {string|Function|object} expr The table expression to annotate
* with collation metadata.
* @param {Intl.LocalesArgument | ((a: any, b: any) => number)} comparator
* A comparator function or the locale(s) to collate by.
* @param {Intl.CollatorOptions} [options] Collation options, applicable
* with locales only.
* @return {object} A wrapper object representing the collated value.
* @example orderby(collate('colA', 'de'))
*/
export default function(expr, comparator, options) {
return wrap(expr, {
collate: isFunction(comparator)
? comparator
: new Intl.Collator(comparator, options).compare
});
}
3 changes: 3 additions & 0 deletions src/util/is-function.js
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
/**
* @returns {value is Function}
*/
export default function(value) {
return typeof value === 'function';
}
88 changes: 87 additions & 1 deletion test/verbs/orderby-test.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import assert from 'node:assert';
import tableEqual from '../table-equal.js';
import { desc, op, table } from '../../src/index.js';
import { collate, desc, op, table } from '../../src/index.js';

describe('orderby', () => {
it('orders a table', () => {
Expand All @@ -23,6 +23,92 @@ describe('orderby', () => {
tableEqual(dt, ordered, 'orderby data');
});

it('orders a table with collate comparator', () => {
const cmp = new Intl.Collator('tr-TR').compare;

const data = {
a: ['çilek', 'şeftali', 'erik', 'armut', 'üzüm', 'erik'],
b: [1, 2, 1, 2, 1, 2]
};

const dt = table(data).orderby(collate('a', cmp), desc('b'));

const rows = [];
dt.scan(row => rows.push(row), true);
assert.deepEqual(rows, [3, 0, 5, 2, 1, 4], 'orderby scan');

tableEqual(
dt,
{
a: ['armut', 'çilek', 'erik', 'erik', 'şeftali', 'üzüm'],
b: [2, 1, 2, 1, 2, 1]
},
'orderby data'
);

tableEqual(
table(data).orderby(desc(collate('a', cmp)), desc('b')),
{
a: ['üzüm', 'şeftali', 'erik', 'erik', 'çilek', 'armut'],
b: [1, 2, 2, 1, 1, 2]
},
'orderby data'
);
});

it('orders a table with collate locale', () => {
const data = {
a: ['çilek', 'şeftali', 'erik', 'armut', 'üzüm', 'erik'],
b: [1, 2, 1, 2, 1, 2]
};

const dt = table(data).orderby(collate('a', 'tr-TR'), desc('b'));

const rows = [];
dt.scan(row => rows.push(row), true);
assert.deepEqual(rows, [3, 0, 5, 2, 1, 4], 'orderby scan');

tableEqual(
dt,
{
a: ['armut', 'çilek', 'erik', 'erik', 'şeftali', 'üzüm'],
b: [2, 1, 2, 1, 2, 1]
},
'orderby data'
);

tableEqual(
table(data).orderby(desc(collate('a', 'tr-TR')), desc('b')),
{
a: ['üzüm', 'şeftali', 'erik', 'erik', 'çilek', 'armut'],
b: [1, 2, 2, 1, 1, 2]
},
'orderby data'
);
});

it('orders a table with combined annotations', () => {
const data = {
a: ['çilek', 'şeftali', 'erik', 'armut', 'üzüm', 'erik'],
b: [1, 2, 1, 2, 1, 2]
};

const dt = table(data).orderby(desc(collate(d => d.a, 'tr-TR')), 'b');

const rows = [];
dt.scan(row => rows.push(row), true);
assert.deepEqual(rows, [4, 1, 2, 5, 0, 3], 'orderby scan');

tableEqual(
dt,
{
a: ['üzüm', 'şeftali', 'erik', 'erik', 'çilek', 'armut'],
b: [1, 2, 1, 2, 1, 2]
},
'orderby data'
);
});

it('supports aggregate functions', () => {
const data = {
a: [1, 2, 2, 3, 4, 5],
Expand Down

0 comments on commit f98e2f9

Please sign in to comment.